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library can be significantly enhanced by methods which localize multiple molecular copies of each unique clone into discrete regions or 
compartments prior to functional expression. In one embodiment, this invention provides methods for in situ transfection of a sorted library 
in a "bar-coded" vector to carry out expression of genes from libraries being screened in readout cells. It is the ability to detect a biological 
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FUNCTION-BASED GENE DISCOVERY 



1 . FIELD OF THE INVENTION 

The present invention relates generally to the field of genomics. More particularly, 
the present invention relates to methods for function-based gene discovery. Genes are 
identified as having or being associated with a specific function, as participating in a 
specific functional pathway, or as being a member of a specific functional group, by 
functional expression in one or more biological readout assays. This invention is based, at 
least in part, on the recognition that the signal-to-noise ratio of a readout assay used to 
screen a cDNA library can be significantly enhanced by methods which localize multiple 
molecular copies of each unique clone into discrete regions or compartments prior to 
functional expression. In one embodiment, this invention provides methods for in situ 
transfection of a sorted library in a "bar-coded n vector to carry out expression of genes from 
libraries being screened in readout cells. It is the ability to detect a biological readout in a 
readout cell line which enables the user to identify genes having specific functions. The 
methods set forth herein are suitable for application in a high throughput format for 
identification of genes and their functions simultaneously. 



2— BACKGROUND OF THE INVENTION 

In the past 25 years, approximately 5,000 human genes have been cloned in full 
length and characterized by specific biological functions through various of assay systems. 
This represents only about 5% of an estimated 100,000 different genes in the human 
genome. A state-of-the-art method for determining gene function is to first individually 
clone a full-length cDNA encoding a protein-of-interest and then to perform assays in an 
attempt to determine biological function of the cloned gene* This approach can be quite 
expensive and inefficient due to the high cost of labor and materials for such work and 
because the entire process must be repeated from the beginning for each new gene to be 
characterized. In this regard, it generally takes several years for a skilled researcher to 
identify a new gene and characterize its corresponding function using methods focused on 
individual genes. With the advent of nucleic acid array technology to determine differential 
mRNA expression, the ability exists to analyze more than one gene at a time. For example, 
one might employ this method to select a subset of the 95,000 remaining genes to be 
characterized in a given tissue of interest. With this improvement alone, however, the speed 
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of gene function discovery would remain painfully slow. One reason is that differentia] 
mRNA expression analysis does not address the question of gene function in any depth. 
Instead, the technique may simply mark a gene as an interesting target worthy of further 
study. By contrast, functional expression in a heterologous background permits a gene to ' 
display a detectable function. 

Some attempts have been made to develop systems for mammalian genetic 
functional screening (described further in Section 2.2 below). However, all of these 
attempts have involved detecting a positive signal through conferring a growth advantage. 
Such systems require weeks to grow cells under selective pressure, and the use of selection 
tends to result in the cloning of mutated genes. Of course, these systems are limited to 
functional identification of growth-related genes, and will not identify genes, for example, 
that are associated with cell death, that are toxic to a cell, or that cause other morphological 
changes. Accordingly, there is an urgent need for increased efficiency in the process of 
gene identification and functional characterization such that it not only takes less time but 
also yields more information. 



2.1. GENOMIC SCIENCES AND DRUG DISCOVERY 

There are about 40,000 prescription drug products currently available on the market, 

including over 6,700 brand names and 1 T 600 FDA-apprnveH drugs. Despite this large 

number, drugs are known to work on only about 417 molecular targets in the human body 
and fewer than 80 molecular targets in bacteria, viruses and parasites (see Drews, 1996, 
Genomic sciences and the medicine of tomorrow, Nature Biotechnology 14, 1516-1518). 
Of course, drugs achieve their desired effects by binding to specific cellular targets (e.g. 
receptors, ion channels, enzymes, and other proteins or molecules). For example, 
breakthrough drugs for hypertension, depression, migraine, schizophrenia, and ulcers all act 
via specific receptors (Drews, 1996, Id.). 

There are perhaps 100 to 1 50 major diseases in need of development of new 
treatments (Drews, 1996, Id.). If the number of genes contributing to each of these complex 
disease phenotypes is five to ten, and if each gene product interacts with from three to ten 
other gene products, then the number of genes associated with these conditions is perhaps 
from 3,000 to 10,000 (Drews, 1996, Id.). All disease-associated genes may be considered to 
be potential drug intervention targets and/or diagnostic markers. By contrast to the 420 
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known human molecular targets on which currently-available drugs are believed to work, 
the majority of the 3,000 to 10,000 human disease-associated genes have not yet been 
identified. From these considerations it is readily apparent that isolation and 
characterization of remaining disease-associated genes will dramatically broaden the 
5 horizon for development of new and/or improved treatments for most human diseases. 

Identification and functional characterization of previously-unknown genes provides 
proteins which may be useful as drugs. Historically, such drugs have been useful when the 
body makes too little of an important protein, or when the presence of supplemental 
amounts of a protein can arrest or reverse a disease process. Protein drugs have been made 

10 possible by genetic engineering which has enabled industrial-scale protein production. 
Among the current beneficiaries of protein drugs are, e.g., individuals who have had heart 
attacks and receive clot-dissolvers, individuals with renal failure who receive erythropoietin 
for anemia, and individuals with diabetes who receive recombinant human insulin. 

Identification and functional characterization of new genes also provides reagents 

15 for potential use in gene therapy. Gene therapy may be defined as the introduction of 
genetic material into an individual for therapeutic benefit. Gene therapy may be used, for 
example, to correct detrimental genetic changes that occur in tumor cells, or to direct an 
individual's cells to produce a specific protein having therapeutic value. Although gene 
theirapyjst^^ 



Such disorders include those associated with single genes, such as hemophilia, sickle cell 
anemia, thalassemia, Gaucher's Disease, Huntington's chorea, and many others. More 
complex, polygenic diseases, such as diabetes and Alzheimer's disease, are also likely to 
benefit from gene therapy. 

^ In more complex conditions, such as dementia and severe obesity, several distinct 

diseases may actually exist concurrently. In such cases, a condition may be mistakenly 
classified as a single disease simply because medical science lacks the information and tools 
necessary to distinguish among the underlying disease processes. If functional information 
were available for most genes in the genome, it might become possible to accurately 

^ ^ identify each specific disease and a corresponding optimal therapeutic intervention within 
such complex conditions. 

Discovery of new genes and their functions permits development of diagnostics for 
early detection of diseases. Such diagnostics, in turn, permit timely use of drugs or other 



■3- 



WO 99/55886 



PCT/US99/08823 



therapies for preventing irreversible damage. For example, current commercially-available 
gene-based diagnostics include tests for hemophilia A and B, phenylketonuria, 
retinoblastoma, and sickle-cell anemia. New, gene-based diagnostics may also be used to 
enhance the success rate of an existing therapeutic by identifying specific individuals within 
5 an affected group who respond well to a specific drug therapy. Similarly, diagnostics may 
help in development of new therapeutics through enhancement of understanding of 
differences among people in response to various medicines. 

Existing expressed sequence tag (EST) databases are not, by themselves, sufficient 
to determine biological function. EST databases only suggest' functional information to the 
10 extent that an EST encodes a domain of known function. Such databases do not provide 
any functional information for completely novel genes (i.e. genes not encoding any known 
domains or motifs). 

As mentioned above, the state-of-the-art for determination of gene function has been 
to first clone a full length cDNA and then pursue functional characterization on an 
15 individual gene basis. The time consuming nature of the so-called single-gene approach can 
be illustrated by examination of the progress made. By 1995, the rate of functional 
characterization of newly-discovered genes reached a plateau of about 2,000 genes per year. 
If this rate continues, it would take another 46 years to identify the function of the genes 
remaininglb be characterized'inthe human genmerThTinvention set fortlrhereiri 

2 0 provides methods to accelerate this schedule considerably. 

It is believed the most efficient way to accomplish this characterization is to 
combine information from total genome sequencing with a database on gene expression 
patterns and another database on biological function, so that most of the estimated 100,000 
genes encoded by the human genome can be grouped into a much smaller number of multi- 
25 component, core processes of known biochemical functions. Following this approach, gene 
groups, and then genes having strong medical relevance, would be prioritized for further, 
more thorough biological studies. 

In contrast to the single-gene approach employed by previously available 
technologies, the invention described herein provides high throughput methods which 

3 0 combine the simultaneous isolation of gene structure with identification of gene function 

and/or functional gene group. By doing so, the method is able to directly screen 
mammalian cDNA libraries (average size 10 6 clones) using mammalian cell systems for 

-4- 
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biological functions with specific cellular markers. With high resolution bioassay 
technology, this strategy also yields all genes in the human genome which are involved in a 
particular biological functional process of interest. In addition, this strategy makes it 
possible to automate the gene function screening process in a high throughput fashion. 
5 Accordingly, this invention provides a much more efficient way to characterize the function 
of the human genome, as described in detail in Section 5 below. 

A brief overview of functional genomics approaches currently under way serves to 
illustrate the state of the art. With regard to EST databases, Human Genome Sciences, Inc. 
(HGS) and Incyte Pharmaceuticals, Inc. (Incyte), have produced proprietary EST databases 

10 comprising partial sequences of perhaps more than 70% of all human genes. Despite the 
fact that these EST databases are not yet linked in a meaningful way to functional 
information, seventeen of the largest pharmaceutical companies have spent more than $482 
million to subscribe, according to a 1996 report (Friedrich, 1996, Nature Biotechnol. 14, 
1234). Additional organizations have chosen positional cloning strategies for linking gene 

15 structure with function. Still other organizations are applying nucleic acid array 
.technologies for analysis of expressed genes in a given tissue or cell (e.g. Affymetrix, 
Incyte). 

Array technology, which represents the first attempt to go beyond single-gene 
methodsof genome analysisTi^^ charactmzaf iMa f gene expressions 

2 0 opposed to characterization of gene function. For example, one may use array technology 
to determine differential gene expression patterns in disease, thereby narrowing disease- 
gene candidates to a subset of genes. However, even under this approach, the speed of gene 
function discovery is not likely to increase significantly. This is so since such an approach 
would identify not only genes which may contribute to the cause of a disease process but 

2 5 also genes having altered expression as a consequence of a disease process. Further, the 

number of genes in the latter category is likely to vastly outnumber those in the first 
category. Analyzing potentially hundreds of genes that may be implicated in a given 
disease by an expression analysis using a single-gene approach would quickly become an 
overwhelming task. This is particularly evident when one considers that a given 

3 0 organization generally has a limited number of biological assays available in-house, i.e. far 

from enough for beginning to determine the biological function of new genes en masse. 
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It is clear that a genetic screen capable of identifying all genes associated with a 
specific biological function would be the most efficient way of linking gene structure to 
function. Although genetic screening approaches have been widely used for such organisms 
as Drosophila and C. elegans, there is no such approach widely applicable to mammalian 
5 systems. This is primarily due to the large size of mammalian genomes (i.e. 10 5 genes) and 
a lack of sensitive assay systems for detecting positive signals over background noise. 

Nevertheless, limited attempts have been made at developing mammalian genetic 
functional screening systems. For example, such systems have been described by Deiss and 
Kimchi (1991, Science 252, 117-120) and by Cohen (1996, Cell 85, 319-329). However, 
1° these systems are slow, labor-intensive, restricted to cloning growth-related genes, and have 
a tendency to isolate mutated genes. This latter tendency arises from a requirement for 
relatively long-term culture (i.e. two or more weeks) under selective pressure for 
identification of a growth phenotype. 

Accordingly, a great need exists for a large-scale (i.e. genome-wide) mammalian 
15 genetic functional screening method which may be employed over a time period of days 
instead of weeks and which provides an automated, general format for use instead of a 
manual, specific format that must be tailored to each functional readout assay. This 
invention provides such a method, as described in detail in Section 5 below. 



20 2.2. EXPRESSION CLONING 

Many methods have been described for cloning genes by functional expression. 
One method by Clarke et al. (June 23, 1987, Method for identification and isolation of 
DNA encoding a desired protein, U.S. Patent No. 4,675,285) provides a ten-step approach 
for selection of cDNAs expressed from sub-pools of a library which includes testing media 

2 5 from cultured cells in which sub-pools are expressed so as to identify a cDNA encoding a 

desired protein. Another method by King et al. (August 5, 1997, "Method of expression 
cloning," U.S. Pat. No. 5,654,150) provides an improvement which employs pools of about 
100 individual bacterial colonies. Yet another method by Sang (March 31, 1993, 
"Expression cloning method," European Patent Application Pub. No. 0 534 619 A2) 

3 0 employs antibodies or ligands to screen expression libraries. As a general proposition, 

however, these methods have often been designed for very specific purposes, i.e. for 
identification of a single gene, and therefore lack general utility. For example, one method 

-6- 
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utilized a transient COS cell expression assay and monoclonal antibody binding to identify 
CD28 (Aruffo and Seed, 1987, Proc. Natl. Acad. Sci. U.S.A. 84, 8573-8577). 

3. SUMMARY OF THE INVENTION 

5 This invention provides methods for function-based gene discovery. Genes are 

identified as having or being associated with a specific function, as participating in a 
specific functional pathway, or as being a member of a specific functional group, by 
functional expression in one or more biological readout assays. This invention is based, at 
least in part, on the recognition that the signal-to-noise ratio of a readout assay used to 

10 screen a cDNA library can be significantly enhanced by methods which localize multiple 
molecular copies of each unique clone into discrete regions or compartments prior to 
heterologous expression. In one embodiment, this invention provides methods for in situ 
transfection of a sorted library in a "bar-coded" vector to carry out expression of genes from 
libraries being screened in heterologous readout cells. It is the ability to detect a biological 

15 readout in heterologous cells which enables the user to identify genes having specific 
functions. The methods set forth herein are suitable for application in a high throughput 
format for identification of genes and their functions simultaneously. 

This invention provides a method for enhancing the signal-to-noise ratio of a 

biologicalWdout^^^ 

2 0 the bar-coded cDNA library using a nucleic acid array; and (b) transfecting the library 
sorted in step (a) into a readout cell line in situ. In one embodiment, the nucleic acid array 
is a biological array or a gene chip, in another embodiment, the biological array comprises 
a vector carrying a plurality of complementary bar codes. In still another embodiment, the 
plurality of complementary bar codes is immobilized on a support. In a preferred 
2 5 embodiment, the support is nitrocellulose or nylon. In another embodiment, transfecting in 
situ is carried out using a chemical transfectant or electroporation. In another embodiment, 
the readout cell line is NTH 3T3 cells carrying a reporter gene under the control of a 
response element or promoter. Selection of the response element or promoter is guided by 
the particular readout assay selected. In still another embodiment, the reporter gene is 
selected from the group consisting of p-galactosidase, luciferase and chloramphenicol 
acetyltransferase. In a preferred embodiment, the response element or promoter is selected 
from the group consisting of an NFkB response element, an NFAT response element, a 



30 
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cyclic adenosine monophosphate response element, a STAT-inducible promoter, a LEF-1- 
inducible promoter and a p53-inducible promoter. In another preferred embodiment, the 
cDNA library is tetracycline-inducible or estrogen inducible. In still another preferred 
embodiment, the biological readout assay detects genes in a pathway selected from the 
group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFkB 
signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT 
signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling 
pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway. 

This invention provides a method for conducting a biological readout assay used to 
screen a bar-coded cDNA library comprising: (a) sorting the bar-coded cDNA library using 
a nucleic acid array; (b) transfecting the library sorted in step (a) into a readout cell line in 
situ; and (c) conducting the biological readout assay. In another embodiment, the nucleic 
acid array is a biological aixay or a gene chip. In another embodiment, the biological array 
comprises a vector carrying a plurality of complementary bar codes immobilized on a 
support. In another embodiment, the plurality of complementary bar codes consists of from 
10 2 to 10 8 complementary bar codes. In another embodiment, the support is nitrocellulose 
or nylon. In another embodiment, transfecting in situ is carried out using a chemical 
transfectant or electroporation. In another embodiment, the readout cell line is NIH 3T3 

cells carr ying a reporter gene under the control of a response ele ment or prom oter. In 

another embodiment, the reporter gene is selected from the group consisting of P- 
galactosidase, luciferase and chloramphenicol acetyltransferase. In another embodiment, 
the response element or promoter is selected from the group consisting of an NFkB 
response element, an NFAT response element, a cyclic adenosine monophosphate response 
element, a STAT-inducible promoter, a LEF-1 -inducible promoter and a p53-inducible 
promoter. In another embodiment, the bar-coded cDNA library is tetracycline inducible or 
estrogen inducible. In another embodiment, the biological readout assay is capable of * 
detecting genes in a pathway selected from the group consisting of a mitogenic signaling 
pathway, a STAT signaling pathway, an NFkB signaling pathway, a stress signaling 
pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wnt signaling 
pathway, a CREB signaling pathway, an AP-1 signaling pathway, a proliferation signaling 
pathway and an anti-proliferation signaling pathway. 
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This invention provides a method for conducting a biological readout assay used to 
screen a bar-coded cDNA library comprising: (a) sorting the bar-coded cDNA library using 
a nucleic acid array having a plurality of concave loci; (b) expressing the bar-coded cDNA 
library sorted in step (a) using in vitro transcription and translation to produce a population 
5 of proteins; and (c) screening the population of proteins produced in step (b) for a 
biochemical activity-of-interest, so as to conduct the biological readout assay. In one 
embodiment, the biochemical activity-of-interest screened in step (c) is selected from the 
group consisting of a receptor-binding activity, a ligand-binding activity and a growth factor 
activity. In another embodiment, screening is carried out by immobilizing the population of 
10 proteins on a solid support for use in a binding assay. In another embodiment, the solid 
support is nitrocellulose or nylon. In another embodiment, screening is carried out by 
placing the population of proteins in contact with readout cells for use in a biological 
activity assay. 

This invention provides a method for identifying one or more genes-of-interest in a 
15 pre-sorted cDNA library comprising: (a) transfecting the pre-sorted cDNA library into a 
population of readout cells; and (b) screening the population of readout cells transfected in a 
biological readout assay, to identify one or more genes-of-interest. 
In one embodiment, the pre-sorted cDNA library comprises a bar-coded cDNA library 

hybridized-to-a-nucleic-acid-anray.- In-another-embodiment, transfectingisxarried.Qutusing_ 

2 0 chemical transfectants or electroporation. In another embodiment, the biological readout 
assay identifies one or more genes-of-interest in a pathway selected from the group 
consisting of a mitogenic signaling pathway, a STAT signaling pathway, an NFkB 
signaling pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT 
signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling 
pathway, a proliferation signaling pathway and an anti-proliferation signaling pathway. 

This invention provides a method of expression cloning one or more genes-of- 
interest in a cDNA library comprising: (a) sorting the cDNA library; (b) transfecting the 
sorted library into a readout cell line; and (c) identifying a positive signal from the 

transfected library in a biological readout assay, so as to expression clone one or more 

30 * 
genes-of-interest in the cDNA library. In one embodiment, sorting the cDNA library is 

carried out using a nucleic acid array. In another embodiment, transfecting the sorted 

library is carried out using chemical transfectants or electroporation. 
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the positive signal is identified by immunocytochemistry. In another embodiment, the 
biological readout assay identifies one or more genes-of-interest in a pathway selected from 
the group consisting of a mitogenic signaling pathway, a STAT signaling pathway, an 
NFkB signaling. pathway, a stress signaling pathway, an apoptosis signaling pathway, an 
5 NFAT signaling pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 
signaling pathway, a proliferation signaling pathway and an anti-proliferation signaling 
pathway. 

This invention provides a method of sorting a cDNA library for use in an expression 
cloning assay comprising: (a) cloning a population of cDNA inserts into a population of 

10 bar-coded vectors; (b) preparing the population of bar-coded vectors for hybridization to a 
DNA array by exposing only the bar code region in single-stranded form; and (c) 
hybridizing the population of bar-coded vectors to a nucleic acid array to sort the cDNA 
library. In one embodiment, the nucleic acid array is selected from the group consisting of a 
gene chip and a biological array. In another embodiment, preparing the population of bar- 

15 coded vectors for hybridization to a DNA array by exposing only the bar code region in 
single-stranded form in step (b) is carried out using the following steps in the order stated: 
(a) digesting the population with a restriction endonuclease to linearize the population; (b) 
binding a DNA-binding protein to at least two sites on the population; and (c) digesting the 

population bound in step (b) to expose the single-stranded bar code region. In anoth er 

2 0 embodiment, the DNA-binding protein is selected from the group consisting of a lactose 
repressor protein, a tetracycline repressor protein, E2F, API, SP1 and p53. In another 
embodiment, the restriction endonuclease is selected from the group consisting of NotI, Sfil 
and EcoRI. In another embodiment, digesting the vector population in step (c) is carried out 
using an enzyme selected from the group consisting of exonuclease III, T4 DNA 

2 5 polymerase, Klenow fragment, T7 DNA polymerase, Vent DNA polymerase and Pfu DNA 
polymerase. 



-10- 



WO 99/55886 



PCT/US99/08823 



4. BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 . A phagemid vector for making a bar-coded cDNA library. 

FIG.2A-2B. 2A. Preparation of a bar-coded vector. 2B. Preparation of a bar- 
5 coded cDNA library. 

FIG. 3. Sorting a bar-coded cDNA library. 

FIG. 4. Flow chart of gene identification methods from the step of in situ 
10 transfection to the step of cDNA retrieval. 

FIG. 5. Illustration of a gene chip with a plurality of concave loci. 

5. DETAILED DESCRIPTION OF THE INVENTION 

15 This invention provides methods for function-based gene discovery. Genes are 

identified as having or being associated with a specific function, as participating in a 
specific functional pathway, or as being a member of a specific functional group, by 

_ expression in one or more biological readout assays. This invention provides express^ 
cloning methods enabling high-throughput library screening for determination of gene 

2 0 function. The invention is based, at least in part, on the recognition that the signal-to-mnse 
ratio of a readout assay used to screen a cDNA expression library can be significantly 
enhanced by localizing multiple molecular copies of each unique clone into discrete reguns 
or compartments. It is the ability to detect a biological readout in heterologous cells wbch 
enables the user to identify genes having specific functions. A major advantage of the 

25 invention is to provide methods for assaying all genes in a cDNA expression library 

simultaneously, instead of one-at-a-time, under conditions in which the readout signal-to- 
noise ratio is significantly enhanced. Moreover, a rational basis for characterization of 
functional gene groups is provided where more than one gene is identified in any given 
readout assay. 

30 In one embodiment, this invention provides methods for in situ transfection of a 

sorted library in a "bar-coded" vector to carry out expression of genes from libraries being 
screened in heterologous readout cells. The vector "bar code" is an oligonucleotide 
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sequence within the vector which is unique to each individual clone of a library. The bar 
code enables sorting of the library in physical space by hybridization to nucleic acid arrays 
which are complementary to library bar code sequences. The bar code unique to each clone, 
together with the unique position of each complementary bar code in a nucleic acid array, 
5 provides a method for direct retrieval of a gene having a function of interest identified in 
any given readout assay. Moreover, each unique bar code can serve as a specific primer for 
PCR and/or sequencing of a desired clone in a library. 

5.1. GENERAL CONSIDERATIONS 

1° In both above-mentioned embodiments of the invention, it is the ability to detect a 

biological readout upon heterologous expression which enables the user to identify genes 
having specific functions. These embodiments are described in detail in Sections 5.2 below. 
A major advantage of the invention is to provide methods for assaying all genes in a cDNA 
library simultaneously for the ability to modify a specific biological function associated 

15 with a specific readout assay. The pattern of gene activity in any given readout assay also 
provides a rational method for identification of functional gene groups. The methods set 
forth are suitable for application in a high throughput format for rapid identification of 
genes and their functions. For example, such a high throughput format may easily screen, at 
least 10", or from 10 4 to 10 6 , independent recombinants for functional activity at one time. 

20 To practice the invention, a complementary DNA (cDNA) library is prepared from 

messenger RNA (mRNA) obtained from a cell population of interest (e.g. a cell population 
may be derived from a specific tissue, disease, or biological state). A cDNA library may 
also be purchased commercially. The cDNA is operably linked to an expression vector 
suitable for use with the invention. Constructs are prepared and purified using standard 

25 recombinant DNA techniques as described in, e.g., Sambrook et al. (1989, Molecular 

Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring 
Harbor, New York). Expression vectors suitable for use with the invention are available 
commercially, or can be specially designed by the user or as described herein. 
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5.1.1. OVERVIEW 

The library technology of the invention has, inter alia, the following three features: 
(a) inducibility of library gene expression; (b) suitability for use with sense and antisense 
libraries; and (c) suitability for use with libraries from various disease and non-disease 

5 tissues and/or cells, and/or various stages of development of interest. The user will note 
that when a bar-coded vector is used virtually any cDNA library vector is made suitable 
when modified to comprise a bar code as described herein. Such vectors do not require an 
inducible promoter because these cDNA libraries are directly transfected into readout cells 
without having to be propagated in virus-producing cells. 

0 The method is suitable for use with a microscope-based, in situ approach for 

detection of various readouts. Such readouts may include, but are not limited to: target 
protein expression; target mRNA expression; cellular localization changes; and/or cellular 
morphology changes. Such single-format, microscope-based detection may be easily 
automated. Because screening of libraries and/or sub-libraries requires only a few days, the 

■ 5 possibility of appearance of mutated genes during prolonged growth in cell culture, as with 
prior art methods, is largely eliminated. 

The methods of the invention are suitable for use in high throughput screening of a 
large number of functional (i.e. readout) assays of interest. Such functional assays include 

cell culture-based assays (i.e. cellular r eadou t assays, see Section 5.5) that rely upon 

*° expression of genes in a library and detection of a functional effect of expression. This 
accelerated time scale provides a two-fold advantage in that it (a) requires a reduced 
workload relative to procedures requiring longer assay time; and (b) vastly reduces the 
appearance of mutated genes arising from prolonged cell culture time. The functional assay 
technology that can be used includes all existing immunostaining assays and biochemical 

25 assays. See Sections 5.5-5.7 below for a description of assays and Section 6 below for 
examples. Such assays are designed to identify genes involved in major disease categories 
as well as genes that regulate various cellular physiological functions. 



30 
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5.1.2. mRNA SOURCES 
There are no special considerations when choosing a messenger RNA (mRNA) 
source for construction of a cDNA library for use with the methods of the invention. Any 
mRNA source may be used. Accordingly, cells suitable as sources of mRNA from which a 
5 cDNA library may be constructed include, but are not limited to, mammalian cells, bacterial 
cells, yeast cells, insect cells and amphibian cells. However, because of (a) the relative 
absence of genetic functional screening systems available for mammalian organisms 
compared to, say, flies or yeast, and (b) the relative complexity of the mammalian genome, 
the methods of the invention are preferred for use in screening mammalian cDNA libraries. 

10 Suitable mammalian mRNA sources include tissues and cell lines. Mammalian tissues that 
may be used include normal and disease tissues (e.g. carcinomas, lymphomas). Mammalian 
cell lines that may be used include any of the cell lines available from the American Type 
Culture Collection (ATCC). Exemplary mammalian cell lines include Chinese hamster 
ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) cells, monkey kidney cells (e.g. 

15 COS), human hepatocellular carcinoma cells (e.g. Hep G2), human embryonic kidney cells 
(e.g. HEK 293), mouse Sertoli cells, canine kidney cells (e.g. MDCK), buffalo rat liver cells, 
human lung cells, human liver cells and mouse mammary tumor cells. 

5.1.3. cDNA LIBRARIES 



2 0 Sense or antisense cDNA libraries may be generated by any method known in the 

art. Many such methods exist and examples may be found in Sambrook et al. and Ausubel 
et al., both of which are incorporated by reference herein in their entireties (Sambrook et al., 
1989, Molecular Cloning, A Laboratory Manual, 2d Ed., Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, New Yo±; Ausubel et al., eds., in the Current Protocols in 

2 5 Molecular Biology series of laboratory technique manuals, © 1987-1 997 Current Protocols, 
© 1994-1997 John Wiley and Sons, Inc.). Many references are available which describe 
antisense cDNA library construction (see e.g., Spann et al., 1996, Proc. Natl. Acad. Sci. 
U.S.A. 93, 5003-5007; andDeiss and Kimchi, 1991, Science 252, 117-120). 

The library may be an antisense library such that antisense polynucleotides are 

30 generated upon expression of the library. Such antisense polynucleotides may, for example, 
provide a source of inhibition of a detectable cellular event in a functional assay. In this 
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way, an antisense expression library will identify one or more genes required for operation 

of a specific pathway by "knocking out" (i.e. rendering inoperative) such a pathway. 

A library may be divided into subpools for screening. For example, from 100 to 

1 ,000 subpools may be generated, each subpool comprising a cDNA diversity of from about 
5 10,000 to about 1 ,000 clones, thus representing a cDNA diversity in all pools combined of 

about 1,000,000. Each library subpool may be individually (i.e. separately) expressed in a 

heterologous cell population. 

A library may be a normalized cDNA library. Any cDNA library normalization 

technique known to one skilled in the art may be used with the methods of the invention. 
0 For example, see "Normalization and subtraction: two approaches to facilitate gene 

discovery," Genome Research 6, 791-806 (1996). 



5.2. BAR-CODED VECTORS 

A "genetic bar code" is an oligonucleotide tag or label having a specific sequence. 

This invention provides a method of constructing a cDNA library in a vector containing a 

plurality of genetic bar codes at a diversity equal to or larger than the diversity of the cDNA 

library. This invention provides methods for sorting and transfecting such a library. The 

methods employ a unique genetic bar code linked to each clone in a library for various uses 

_(e^sorting,.retrie.vaLof insert) 

20 

The human genome is believed to encode about 100,000 genes, and any given 
human cell or tissue may express from about 10,000 to about 50,000 of these genes. 
Therefore, in order to cover every expressed gene (including rare genes) during preparation 
of a human cDNA library from messenger RNA, it is preferred that about 10 6 independent 
clones be used. Accordingly, a vector having about 10 6 unique genetic bar codes is 

25 

preferred since it is preferred that a unique genetic bar code be associated with each library 
clone. 

A bar code having ten nucleotides and using all four possible bases at each position 
is capable of generating a set of genetic bar codes having a diversity of 4 10 or 1.048 x 10 6 . 
The optimum length and base composition of oligonucleotides (oligos) for specific and 

30 

efficient hybridization to complementary sequences may be chosen by the user. For 
example, oligos 15 to 20 nucleotides long having a diversity of 4 15 to 4 20 (z.e., 10 9 to 10 12 ) 
may be used to cover a library of 10 6 diversity to ensure that any two genetic bar codes are 
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different by several nucleotides. Various approaches known in the art may also be used to 
reduce cross-hybridization among different bar codes (see e.g. Shoemaker et al., 1996, 
Nature Genetics 14, 450-456). 

5 5.2.1. CONSTRUCTING A BAR-CODED VECTOR 

The vector employed for use with a bar-coded cDNA library may be any vector 
incorporating a genetic bar code. In one embodiment, a suitable vector comprises the 
genetic bar code and a eukaryotic promoter. In another embodiment, a suitable vector 
comprises the genetic bar code, a eukaryotic promoter (e.g. a CMV promoter), a cDNA 
10 insert, an fl origin, an antibiotic resistance gene (e.g. ampicillin resistance gene), an SV40 
origin and a ColE origin (see e.g. FIG. 1). For the phagemid vector illustrated in FIG. 1, 
sites 1 and 2 may be used for inserting the genetic bar code, sites 3 and 4 may be used for 
inserting cDNA, the fl origin may be used for making single-stranded DNA, and the 
antibiotic resistance gene provides for growth selection of the phagemid vector in E coli. 
15 The ColE and SV40 origins may be used to provide high copy number amplification of 
phagemid DNA in bacteria and eukaryotic cells, respectively (see FIG. 1). 

By way of example and not limitation, a bar-coded library vector may be 
constructed as illustrated in FIG. 2A. Here, the vector and the bar code mixture are each 
— -digisTedmh^rSpieTl-and ^-(wWchxut-at-sites 1 ^ 
20 produce the bar-coded library vector (FIG. 2A). A bar-coded library may be constructed as 
illustrated in FIG. 2B. Messenger RNA (mRNA) is reverse transcribed using an oligo-dT 
primer containing restriction site 3 using methods well known to those skilled in the art (see 
FIG. 2B). Following conversion into double-stranded cDNA, an adapter or linker 
containing restriction site 4 is ligated to the 5' end (relative to the sense strand). The 
25 resulting double-stranded cDNA bears site 4 at its 5' end and site 3 at its 3' end. This cDNA 
and the bar-coded vector are each digested with enzymes 3 and 4 and ligated together to 
produce the double-stranded, bar-coded cDNA library. It is preferred that any library 
amplification be performed on plates, as opposed to in solution, to ensure equal 
amplification of all clones represented. 
3 0 One skilled in the art will readily recognize and appreciate that the various features 

of a suitable vector need not be precisely as illustrated. For example, the location of the bar 
code sequence can be other than at the illustrated location within the vector. 
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To ensure that each bar-coded vector only carries one genetic bar code, a population 
of double-stranded genetic bar codes is designed in dephosphorylated form and having a 
staggered restriction enzyme site at both ends of the bar code (e.g. EcoRI). For example, 
where chemically-synthesized oligonucleotides are already dephosphorylated following the 
5 chemical synthesis. An enzyme (e.g. alkaline phosphatase) can also be used to insure the 
dephosphorylated state. This method ensures that, after annealing of the two bar code 
strands, a double-stranded bar code is formed which lacks the phosphorylation which would 
be necessary for the formation of a bar code dimer. 

Further, one can apply a "zero background" cloning system (e.g., such a system is 

10 commercially available from InVitrogen) to clone a bar code population in to the chosen 
vector. A zero background cloning system is a positive selection system for prokaryotic 
cloning which works by direct selection of inserts via disruption of the lethal gene ccdB 
(control of cell death). In such a system, only bacteria transformed with a genetic bar code 
inserted into the vector will survive and be propagated. In this way, the vector population 

15 generated will not contain any individual vectors lacking a bar code. 



5.2.2. SORTING A BAR-CODED LIBRARY 

Sorting of a bar-coded library may be carried out using supports having bound 

thereto-oligonucleotide-sequences4hat arexomplementaryJo^the^genetic_harxodes_oXt_he 

cDNA library. A DNA sequence complementary to each genetic bar code in a bar-coded 
library is affixed (e.g. deposited or synthesized) at discrete locations of a nucleic acid array. 
Natural or modified nucleotides can be used for synthesis. Use of certain modified 
nucleotides may promote formation of stronger bonds with the complementary bar code of a 
vector. In this regard, the bonding properties of common modified nucleotides such as 
phosphorothioates have been well described. 

An array may be a commercially available gene chip (e.g. Affymatrix, Incyte) or 
may be manufactured using methods known in the art. Many such methods have been 
described (for a brief review, see Ramsay, January 1998, Nature Biotechnol. 16, 40-44). 
For example, light-directed, solid-phase synthesis technology permits massive numbers of 
oligonucleotides to be synthesized on a support at precise positions (see e.g. Fodor et ai, 
August 29, 1995, U.S. Pat. No. 5,445,934). To achieve gene separation and sorting using 
such an array, the array is hybridized with the single stranded genetic bar code region of a 
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double stranded vector {see e.g. "ssGBC cDNA library" in FIG. 3, which illustrates such a 
single stranded genetic bar code region on a double stranded vector). In FIG. 3, a gene chip 
having a plurality of complementary genetic bar codes is shown. Each area of the chip 
(labeled A, B, C, and N) contains multiple copies of each unique complementary bar code. 
5 Following hybridization, each unique cDNA corresponding to each unique bar code is 
sorted into a discrete area on the chip {see bottom panel of FIG 7.) 

Optimum hybridization conditions are used to ensure accurate base pairing between 
the various genetic bar codes of a library and their corresponding complements in a nucleic 
acid array so as to prevent or minimize mismatches. The hybridization (a) separates library 
10 vector molecules encoding distinct recombinants from each other and (b) sorts all library 
vector molecules encoding the same recombinant to discrete, known locations. This 
separation and sorting operation results in an equal amount of DNA at each location. 
Abundant genes of a library will be represented at multiple locations on a chip while rare 
genes will be represented at only one or a few locations. Equivalent amounts of DNA are 
i5 hybridized at each location. 

As an alternative to gene chip arrays, which may be expensive to manufacture, a 
"biological array" may be manufactured and used to sort a bar-coded expression library. 
Such a biological array is created from a library of sequences which are complementary to 
th e bar codes o f the ex pressi on vector . The biolog ical a rray is segrega ted into discrete 
locations using the physiology of the microbe {e.g. bacteria or yeast), as follows. Since 
each microbe which takes up a plasmid DNA containing a complementary bar code will 
only retain a single type of plasmid, each complementary genetic bar code is automatically 
separated from all others at this step. The user will note that the vector chosen to produce 
the complementary bar code array is different from the expression vector used to create the 
library so as to preclude any hybridization between the two vectors. 

In this way, microbial colonies carrying a plasmid library encoding complementary 
bar codes are used to construct a biological array. Such a biological array can be easily 
reproduced by replica plating. The DNA of the array is easily immobilized on a solid 
support {e.g. nitrocellulose, nylon, etc.) by well known methods. The sequence of the 
complementary genetic bar code at' each location of the array may be determined by 
standard sequencing reactions. This information may then be stored in a computer for later 
retrieval as needed. 
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A biological array is different from that of a gene chip array as follows. Instead of 
having each complementary bar code present as a homogeneous nucleic acid at each 
location of the array, each is present in a background of other DNA (i.e. from the plasmid 
carrying the complementary bar code and the microbial genome). 

5.2.3. iTV £7777 TRANSFECTION OF A SORTED LIBRARY 

Analysis of a sorted library by functional expression provides a means to screen a 
large number of genes simultaneously in order to identify genes having the biological 
function specified by the chosen readout assay. Any of the following methods may be used 
for in situ transfection of a sorted library. Following all of the transfection procedures 
described below, readout cells are rinsed in a physiologic buffer and cultured for a period of 
time to allow expression of the transfected genes. The period of time to allow expression 
may be from 12 hours to 12 days, from 1 day to 6 days, or from 2 days to 3 days. In a 
preferred embodiment, the period of time is from 1 day to 4 days. 

In one embodiment, in situ transfection may be performed using a gene chip having 
a plurality of concave loci (i.e. U-shaped areas), each locus having an oligonucleotide 
complementary to a genetic bar code attached thereto. Following library sorting by 
hybridization, such a gene chip will have an individual cDNA recombinant at each locus. 

- In situ transfection is-perfomed-by-contact^ 

presence of a solution which facilitates the release of the hybridized recombinants. Such a 
solution may be, e.g., phosphate-buffered saline or tissue culture medium without serum 
supplement. Generally, any low-salt solution (e.g. 150 mM NaCl or lower) will result in the 
dissociation (i.e. release) of the hybridized recombinants from the gene chip. Chemical 
transfectants may also be included in the solution to facilitate uptake of the released DNA 
1 into the readout cells. Such transfectants may be any transfectant known in the art. For 
example, calcium phosphate, DEAE-dextran, polybrene or a lipid-based transfectant such as 
LT1 (Panvera) or Lipofectamine (GibcoBRL) may be used. 

In another embodiment, in situ transfection may be performed using a biological 
array or a gene chip having a flat surface. Here, in situ transfection may be performed by 
3 contacting the biological array or the gene chip to a readout cell line in the presence of a • 
solution as described above. A chemical transfectant may also be used as described above. 
In a preferred embodiment, a micro-compartmentalization grid device may also be used to 

-19- 



WO 99/55886 



PCT/US99/08823 



restrict diffusion of each released recombinant. Such a micro-compartmentalization grid 
device may be as illustrated in FIG. 3 of copending U.S. Patent Application No. 
09/065,776, filed April 24, 1998, entitled "MICRO-COMPARTMENTALIZATION 
DEVICE AND USES THEREOF," by Cen and Sun (Attorney Docket No. 9557-003), 
5 which is incorporated herein by reference in its entirety. 

In yet another embodiment, in situ transfection may be performed by 
electroporation. Electroporation may be performed using, e.g., a cell culture device as 
described in U.S. Patent No. 5,134,070, which is incorporated by reference herein in its 
entirety. Here, the readout cell line used for electroporation may be any readout cell line 

10 which will attach to and proliferate on a solid support. Such a cell line may be grown in a 
monolayer on the bottom of a cell culture device which is electrically conductive (see e.g. 
U.S. Patent No. 5,134,070). A gene chip or biological array having a sorted cDNA library 
attached thereto is contacted with the cell line in the presence of a suitable electroporation 
solution (e.g. phosphate buffered saline or as described in U.S. Patent No. 5,134,070) such 

15 that it is between the electrically conductive upper and lower surfaces of the culture device. 
The contact of the upper surface of the culture device with the electroporation solution 
provides a continuous electric circuit for passing a current which mobilizes the DNA from 
the gene chip or biological array to the readout cells. 

Jn-yetstill-another-embodiment,^ 

2 0 enzymes to facilitate attachment to and release from a gene chip or a biological array. Here, 
one may covalently attach a sorted library to a nucleic acid array to ensure tight binding 
using T4 ligase, or an enzyme having a similar activity, to ligate the oligonucleotides 
encoding the complementary bar codes to the bar-coded cDNA library following 
hybridization. Such a covalently-bound bar-coded cDNA library may be released at will by 

2 5 including a restriction endonuclease in the transfection solution (e.g. if an EcoRl site is 
used as site 2, see FIG. 1, then one may cut with EcoRl). 

Finally, to ensure specific hybridization of a bar-coded cDNA library to a nucleic 
acid array, enzymes which cut mismatched nucleotides (such as T4 Endonuclease VII, also 
called resolvase, see Youil et al 9 1996, Genomics 32:431) may be used to eliminate cross- 

30 

hybridization following the completion of the hybridization process. 

An overview of the in situ transfection (gene transfer) methods as they relate to the 
overall process of gene identification and retrieval, is illustrated schematically in FIG. 4. 
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5.2.4. BIOCHEMICAL ANALYSIS OF A SORTED LIBRARY 

In addition to expression of a sorted cDNA library in one or more cellular readout 
assays, biochemical analysis of a sorted library may also be performed. A solid support 
having a plurality of concave loci may be used, as described above and depicted in FIG. 5. 
5 For biochemical analysis, each individual protein encoded by the sorted library is first 
expressed using the well-known techniques of in vitro transcription and translation. Since 
each individual protein expressed is compartmentalized at a discrete locus, each may be 
screened for any biochemical activity of interest {e.g. receptor-binding activity, ligand- 
binding activity, growth factor activity). See Section 5.6 below for a description of various 

10 biochemical readout assays. 

Individual proteins may be subsequently immobilized on a solid support (e.g. 
nitrocellulose or nylon) for use in binding or other assays. Alternatively, individual proteins 
may be left free within U-shaped wells for subsequent assay of activity. For example, for 
detection of growth factor activity, in vitro translation products may be placed in contact 

15 with readout cells using mild centrifugation to transfer the contents of each U-shaped locus 
onto a readout cell grid. 

5.2.5. GENE RETRIEVAL AND MONITORING 

- Multiple-methods-are-ava^ 



2 0 genetic bar codes. Following identification of a specific clone-of-interest in a readout assay, 
the unique genetic bar code situated in the vector next to the cDNA insert may be used to 
isolate the clone-of-interest. For example, localized releasing of DNA hybridized on a 
nucleic acid array may be carried out by competition using an oligonucleotide identical to 
the bar code of interest. Further, isolation of the clone-of-interest may be carried out by 

2 5 polymerase chain reaction (PCR) to amplify only insert cDNA linked to the identified 

genetic bar code. Under this approach, for example, the bar code sequence may be used as a 
specific primer together with a suitable vector primer and total library DNA as template. 
The primer of the genetic bar code can also be used for isolating the specific plasmid by a 
procedure referred to as "gene trapping" (Gibco BRL). Briefly, "gene trapping" is a method 

3 0 for rapid isolation of cDNA clones from single stranded DNA prepared from a library. This 

method is based on isolating cDNA clones which hybridize with a biotinylated 
oligonucleotide complementary to a cDNA of interest (see Le et al, 1995, Focus 1 7, 45). 
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Still further, isolation of the clone-of-interest may be carried out by physical "picking" since 
the location of each unique bar code sequence within an array is generally known. 

The genetic bar code technology not only enables performing cellular functional 
assays on all genes represented in a cDNA library simultaneously, but is also amenable to 
5 automation. Such automation allows many functional assays to be performed in a single 
format, as described further in Section 5.8 below. 

5.2.6. GENETIC BAR CODE DESIGN 

To facilitate uniform hybridization between a bar coded library and its 
10 complementary sequences of a nucleic acid array, it is preferred that all oligonucleotides 
(i.e. genetic bar codes) have the same or nearly the same melting temperature (Tm). It is 
also preferred that conditions be provided in which only the genetic bar code of a bar-coded 
vector is in single stranded form, while the remainder of the vector remains in double 
stranded form, so as to minimize interactions among vectors carrying different but possibly 
15 related cDNA inserts. A bar code in a double-stranded vector can be exposed in single- 
stranded form using an enzyme having 3 ! to 5 1 exonuclease activity, such as T4 DNA 
polymerase, and a bar code population having one nucleotide omitted from all bar code 
sequences. Such an exonuclease activity is capable of cleaving nucleotides from a 3 f 

_jre££ss^^ 

2 0 

For example, if C is omitted from the bar code sequence, then T4 polymerase may be used 
in the presence of G and the linearized double stranded vector to expose the bar code in 
single stranded form. As a further example, if A is omitted from the bar code sequence, T4 
polymerase may be used in the presence of T and the linearized double stranded vector to 
expose the bar code in single stranded form. In other words, during a T4 exonuclease 

25 

digestion, all nucleotide triphosphates (NTPs) are omitted from solution except the NTP 
complementary to the nucleotide omitted from the bar code. In this way, T4 DNA 
polymerase and a bar code population lacking one of the four nucleotides in its sequence 
may be used to make the bar code single-stranded and protruding from the end of a 
linearized, double-stranded vector. Alternatively, such an enzymatic exonuclease may be 

3 0 used when all four nucleotides are present in the bar code so long as the timing of the 

reaction is closely controlled to expose only the bar code in single stranded form. 
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In one embodiment, a genetic bar code pool may be designed using a set of 
nucleotide dimers as building blocks for synthesizing the pool. For example, one nucleotide 
dimer set consists of TG, AG, GA and GT. It is notable that this set has a minimum of one 
nucleotide identity between any two given dimers. Such a set is referred to herein as a 

5 minimal mismatch set (MMS). The average pair-wise mismatch for the set of TG, AG, GA 
and GT is 1.7 nucleotides, or 83%. The average pair-wise mismatch is computed by adding 
all the possible pair-wise nucleotide differences and dividing by the total number of pairs. 
In this case, it is 10/6, or about 1 .7 nucleotides, or 1.7/2 = 85%. While the omitted 
nucleotide in the MMS listed above is C, the omitted nucleotide may be any of the four 

0 nucleotides when designing such an MMS. Nucleotide dimers are chosen such that the Tm 
for each dimer remains constant. In this way, the Tm for each genetic bar code of a pool 
will be the same. Methods for computing Tm are well known to one skilled in the art. The 
omission of a nucleotide in design of a bar code pool may be used to allow formation of a 
protruding end encoding the bar codes, as described above. A pool of 20-mers generated 

5 through random synthesis using the above-listed set of nucleotide dimers will produce a 
pool having a diversity of 4 10 or 1 .05 x 10 6 genetic bar codes. The minimum percentage of 
pair-wise mismatches within this pool of genetic bar code 20-mers is 1/20 or 5%, while, as 
noted above, the average pair-wise mismatch between any two genetic bar codes is 83%. 

In anot h er em bodiment, genetic bar codes may be designed using a set of nu cleotide 

^ trimers, each trimer having a minimum of two nucleotides different from any other trimer 
and each trimer having one G. Such an example set, which omits C, consists of AGT, TGA, 
TAG, ATG, GTA and GAT. The average pair-wise mismatch between any two genetic bar 
codes produced using this MMS is 2.4 nucleotides, or 80%. An oligonucleotide pool of 
genetic bar codes constructed randomly from eight rounds of synthesis using the above- 

5 listed six trimers will have a diversity of 1 .68 x 10 6 with each bar code having a length of 
24 nucleotides. The minimum percentage of pair-wise mismatch within this pool of genetic 
bar codes is 2/24 or 8.3%, while the average pair-wise mismatch between any two genetic 
bar codes in this set is 80%. 

If 4-mer oligonucleotides are used as building blocks, each having a minimum of 
three nucleotides different at each position from all other 4-mers and each having one G, 
then there are eight building blocks in the MMS, as follows: GATT, TGAT, TAGA, TTTG, 
GTAA, AGTA, ATGT and AAAG {see also U.S. Patent No. 5,635,400 by Brenner). 
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Alternatively, one can choose another MMS from a total of 32 possibilities (i.e. 2 3 C\ where 
C designates combinatorial) of 4-mers having one G nucleotide and 3 other nucleotides (T, 
A, or mixture of T and A). For example, an MMS may consist of GATA, GTAT, AGAA, 
TGTT, AAGT, TTGA, ATTG and TAAG. A pool of genetic bar codes constructed using 
5 seven 4-mer subunits will produce a bar code length of 28 nucleotides and a bar code 
diversity of 1 x 10 6 . The minimum percentage of pair-wise mismatches within this genetic 

bar code pool is 3/28 or 10.7%. 

In general, if an N-mer oligonucleotide building block having a number of G 
nucleotides in the N-mer equal to k and the remaining nucleotides (i.e., N-k) consisting of A 
10 or T, or an A plus T mixture, then the bar code diversity is equal to 2 (N k) C k N . Further, there 
exists 2 (N - k) C k N number of MMS in these total possible constructs for a give minimal 
mismatch cut-off. In general, the number of sequences in different MMS is not the same. 
For example, in the case 5-mer with two G in each sequence and three nucleotides of A or T 
or A and T mixture. There are 80 sets of MMS. Some of these MMS have 8 sequences and 
15 some have 12 sequences for a mismatch cut-off of 3. It is generally true also that the larger 
the N for a given minimal mismatch cut-off, the larger the number of sequences in any set 
of MMS. It is also true that for a given diversity number in a library of genetic bar codes 
constructed from N-mer nucleotide subunits (such as those listed above, 8 tetrameric MMS 

- Wm - erexam p^ 

20 library. 

Accordingly, another way of constructing a pool of genetic bar codes is to use a 
certain number (e.g. 100) of oligonucleotides as building blocks selected from all possible 
combinations of a fixed length (e.g. 9-mer) with a certain minimum number (e.g. four) of 
nucleotides different among them. The diversity of genetic bar codes will be precisely 

25 1,000,000 if the bar codes are composed of three subunits of 9-mers. Further, the minimum 
percentage difference between any two bar codes within this pool is 4/27 or 14.8%. The 
average pair-wise sequence difference in this pool of 1,000,000 genetic bar codes is 65%, or 
17.5 nucleotides. In a preferred embodiment, to ensure hybridization stability, the number 
of G nucleotides in an MMS nucleotide subunit ranges from 45% to 50%. The number of G 

3 0 nucleotides is the same in all bar codes. A pool of 36-mer genetic bar codes, each 
constructed from four 9-mers of this MMS has a diversity of one hundred million. The 
minimal pair-wise sequence difference mismatch between any two 36-mers in this pool of 
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genetic bar codes is 14,8% while the average pair-wise sequence mismatch is 65%, or 23.4 
nucleotides. 

The following example lists one hundred 9-mers of a minimal mismatch set, each 
having four G nucleotides and five nucleotides selected from the group consisting of A, T, 

5 and an A plus T mixture, and further having a pair-wise mismatch of at least 4 nucleotides 
between any two 9-mers. Calculated using the formula set forth above, there exists 4,032 
members in this MMS of 9-mers (i.e. 2 5 C 4 9 ). The typical number of sequences in each of 
these 4,032 sets ranges from 80 to 101. A pool of 27-mer genetic bar codes, each 
constructed from three 9-mers of this MMS, has a diversity of one million. The minimum 

0 pair-wise sequence mismatch between any two 27-mers in this pool of genetic bar codes is 
14.8%o while the average pair-wise sequence mismatch is 65%. Likewise, a population of 
36-mer genetic bar codes, each constructed from four 9-mers of this MMS, has a diversity 
of one hundred million. The minimal pair-wise sequence mismatch between any two 36- 
mers in this pool is 14.8%, while the average pair-wise sequence mismatch is 65%, or 23.4 

5 nucleotides. 

The following one hundred 9-mers each has four G. This list sets forth one of the 
4032 minimal mismatch sets available under this scheme. 
GGGTATGAA 

GGGGAAAAT 

0 GGGGTATTA 

GGGAGTATA 

GGGAAGTTT 

GGGTTTAGT 

GGGATTTAG 
5 GGAGGTTAA 

GGAGAGATA 

GGTGTGTAT 

GGAGTTGTT 

GGTGATTTG 
° GGAAGGAAT 

GGTTGGTTA 

GGTAGAGAA 
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GGATGAAGA 

GGTAGTTGT 

GGAAGATTG 

GGTTGTAAG 

GGAATGTGA 

GGATAGTAG 

GGTATGATG 

GGAAAAGGT 

GGTTTATGG 

GAGGGTTTT 

GAGGAGTAA 

GTGGTGATT 

GTGGATAGA 

GTGTGGAAA 

GAGTGAGAT 

GAGAGATGA 

GAGATGGTA 

GAGTAGATG 

-GTGTTAGGA- 

GTGAAAGAG 

GAAGGAGTA 

GTTGGTGAT 

GATGGAAGT 

GTAGGAAAG 

GATGAGGTT 

GTAGTGGAA 

GTAGAGTGT 

GAAGTGTTG 

GATGTTGGA 

GAAGATGAG 

GTTGTAGTG 

GTATGGGTT 



-26- 



WO 99/55886 



PCT/US99/08823 



GATAGGTAG 

GTAAGTGGA 

GAATGTTGG 

GAATAGGGA 

GTTATGGGT 

GTATTGAGG 

GTTTATGGG 

AGGGTGAAA 

AGGGATTGT 

TGGGTTATG 

AGGTGGATT 

TGGAGGTAA 

AGGAGTGAT 

TGGTGAGTA 

TGGAGAAGT 

AGGTGATAG 

TGGTTGGAT 

TGGTAGAGA 
-TGGATTGGA 

AGGATAGTG 

TGAGGGTTT 

AGTGGAGTT 

AGTGGTAGA 

AGAGAGGAT 

TGTGTGGTA 

TGTGAGAAG 

AGAGTAGGA 

AGTGTTGAG 

TGAGAAGTG 

AGAAGGGTA 

TGATGTGGT 

TGTAGTGTG 
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AGTTAGGTG 

AGAAAGAGG 

ATGGGGTAT 

TAGGGGATA 

ATGGGTGTA 

AAGGGTAAG 

TTGGGATTG 

TAGGTGTGT 

TAGGAAGGA 

ATGGTAAGG 

TTGTGTGAG 

ATGAGTTGG 

AAGAAGGGT 

AAGTTTGGG 

AATGGGGAA 

TTTGGGTGA 

ATTGGGATG 

AATGAGTGG 

TTAGTTGGG 

TATTGGAGG 
AAAAGAGGG 

If a restriction endonuclease or other mechanism is used to generate a single- 
stranded region in a bar-coded vector, then all four nucleotides may be used for design and 
synthesis of the genetic bar code. For example, restriction endonucleases such as Bbvl, 
Bbsl, Bsal, BsmA 1, BsmF 1, BspM 1, Fokl, Hga I and SfaN I may be used to generate an 
end having four or five protruding nucleotides. 

5.2.7. PRODUCTION OF SINGLE-STRANDED GENETIC 
BAR CODES IN DOUBLE STRANDED LIBRARY 
VECTORS 

An example restriction endonuclease for linearizing a bar-coded library and further 
exposing the genetic bar codes in single stranded form is Bgl II which may be used at site 1 
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in FIG. 5. Bgl II generates a 3* recessed end lacking G. T4 DNA polymerase is then used 
in the presence of GTP (and the absence of other NTPs) to digest the complementary strand 
of the genetic bar code. If an enzyme such as EcoR I is used for site 2 (FIG. 1), T4 DNA 
polymerase will stop degradation at the EcoR I site when it encounters nucleotide G. T4 
5 DNA polymerase is then inactivated by heat. The bar-coded cDNA library having 

protruding single-stranded bar codes may then be purified using standard phenol/chloroform 
extraction and ethanol precipitation. For the convenience of cloning, a Bgl II site plus two 
additional nucleotides at its 5* end (for effective digestion with BgL II) is synthesized as a 
standard component of the vector proceeding the first nucleotide of the genetic bar code. 
10 Likewise, an EcoR I site plus one additional nucleotide at its 3' end (for effective digestion 
with EcoR I) is synthesized as a standard component of the vector after the last nucleotide 
of the genetic bar code. 

5.2.8. HYBRIDIZATION OF A BAR-CODED LIBRARY 
15 WITH A GENE CHIP, WITH BAR-CODED BEADS OR 

WITH A BIOLOGICAL ARRAY 

Hybridization of a bar-coded cDNA library with a genetic bar code population 
(whether on gene chips, beads or biological arrays) is carried out several hours to overnight 
atM^toaltemperatarerprefe^^ 

2 0 the genetic bar code population. The prehybridization buffer may be as follows: 6 x SSC 
(or 6 x SSPE), 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 8.0), 0.5% SDS, 100 
ug/ml denatured, fragmented salmon sperm DNA, and 0.1% nonfat dried milk. 
Hybridization buffer may be 3.0 M TMA CI or 2.4 M TEA CI, 0.01 M sodium phosphate 
(pH 6.8), 1 mM EDTA (pH 7.6), 0.5% SDS, 100 ug/ml denatured, fragmented salmon 

2 5 sperm DNA, and 0. 1% nonfat dried milk. A hybridized bar-coded cDNA library may be 
washed with 6 x SSC solution and 2 x SSC solution as needed. 

5.2.9. OTHER APPLICATIONS OF cDNA OR GENOMIC 

LIBRARIES SORTED USING GENETIC BAR CODES 

30 A sorted, single-stranded cDNA library or genomic library can be used to create a 

"library array" for studying differential gene expression, as described below. 
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To make a single-stranded library array, a bar-coded library having a protruding 
single-stranded genetic bar code region is hybridized with a gene chip. This is followed by 
incubation with a DNA ligase, such as T4 ligase, to covalently bond the complementary 
strand of cDNA to the gene chip. The sense strand of cDNA may be removed by, for 

5 example, denaturing (e.g. 100 °C incubation) under high stringency or through using an 
enzyme such as Exo III and the like (Hoheisel, 1993, Anal. Biochem. 209:238), so as to 
convert a double stranded library into a single stranded library. For preparation of a single 
stranded antisense library, the genetic bar code reg.on is placed in front of a CMV promoter 
and the same strand as sense cDNA. For preparation of a single stranded sense library, the 

10 genetic bar code region is placed at the end of the cDNA and the same strand as antisense 
cDNA. 

Among the many advantages of making such a sorted library array using the genetic 
bar code technology of the invention are the following: 1) such a library array can be 
constructed at a density as high as an oligonucleotide array on a gene chip; 2) only a single 
15 samp ie is needed instead of a million samples for individual spotting for the current DNA 
array technology (i.e. since the cDNA library can be sorted using the genetic bar code 
method, only one sample containing the whole cDNA library is prepared, instead of 
preparing 10 6 individual samples as would be otherwise required); and 3) many different 
^DNATiblarTl^ 



2 0 of bar coded cDNA libraries obtained from various sources without changing the format of 
the gene chip). 

5.2.10. BEADS CAN REPLACE CHIPS WHEN USING 
GENETIC BAR CODES TO SORT A LIBRARY 

25 In another embodiment of the invention, beads may be used instead of a gene chip or 

biological array, for sorting a cDNA library. In this embodiment, each individual bead 
carries multiple copies of one unique complementary bar code. Therefore, each individual 
bead will hybridize to multiple copies of a single recombinant, thereby sorting and 
concentrating individual members of the library to discrete loci. As spherical or spheroid 

3 0 supports which can migrate in solution, beads may provide enhanced hybridization 

efficiency compared to gene chips or biological arrays. Following hybridization, each bead 
represents a single, easily manipulable recombinant which may be assayed under a high- 
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throughput pharmaceutical screening format (e.g. one bead per well) to determine biological 
functions of the encoded cDNAs. For example, one can place each unique bead, with a bar- 
coded vector DNA hybridized thereto, into a well of an assay plate or into a micro- 
compartment of a micro-compartmentalization device (e.g. a 96-well plate, a 384-well plate, 
5 or a plate having a larger number of wells or micro-compartments). Such a micro- 
compartmentalization device may be as described in FIG. 1 through FIG. 4 of copending 
U.S. Patent Application No. 09/065,776, filed April 24, 1998, entitled "MICRO- 
COMPARTMENTALIZATION DEVICE AND USES THEREOF," by Cen and Sun 
(Attorney Docket No. 9557-003), which is incorporated herein by reference in its entirety. 

10 Bead placement may be performed robotically. The presence of a low salt solution in each 
well permits dissociation of vector DNA from the beads. The resulting solution in each 
well, now containing DNA of a single recombinant, may be mixed with a chemically 
transfected or electroporated into readout cells. A removable micro-compartmentalization 
device as described herein may be used during the transfection or electroporation procedure. 

15 If such a device is used, then the grid of the device may be removed from the readout cell 
culture following gene transfer so as to facilitate processing (i.e., rinsing, culturing, and 
assaying for biological function). Any recombinant producing a positive signal in a readout 
assay may be recovered, for example, by sampling the positive cell population and 

perfonrTing"PCR rusing primersflanking-thecDNA-insert-ofthe-veetor- 

20 

5.2.11. AN ALTERNATIVE METHOD FOR GENERATING A 
SINGLE STRANDED GENETIC BAR CODE REGION 
ON A DOUBLE STRANDED VECTOR 

To selectively expose single stranded DNA encoding the genetic bar code region of 

2 5 a double stranded vector, a protein binding site may be installed in the vector. A DNA 

binding protein which recognizes the protein binding site in the vector can then be used to 
sterically hinder (i.e. block or prevent) the 3' to 5' exonuclease progression beyond the bar 
code region of the vector. Such DNA binding proteins may include prokaryotic proteins 
such as a lactose repressor which binds to a lactose operator sequence, a tetracycline (tet) 

3 0 repressor which binds to a tet operator sequence, etc. , or eukaryotic proteins such as a 

eukaryotic transcription factor (e.g. E2F, API , SP1, p53, etc.). Any chosen protein binding 
site may be installed outside of the genetic bar code region defined by site 1 and site 2 using 
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standard recombinant DNA techniques (see FIG. 1). When cDNA library plasmids are 
linearized at site 1, DNA binding proteins may be applied to occupy two sets of DNA 
binding sites. Subsequently, a 3' to 5' single strand exonuclease (e.g. exonuclease III, T4 
DNA polymerase, Klenow fragment, T7 DNA polymerase, Vent DNA polymerase, Pfu 
5 DNA polymerase, etc.) may be used to remove the complementary strand of the genetic bar 
code and site 2 for the genetic bar code end, and to remove site 1 for the non genetic bar 
code end. Exonuclease activity is stopped at the DNA binding sites by steric hindrance 
from the bound DNA binding proteins. Then the exonuclease is heat inactivated and both 
DNA binding proteins and exonuclease are removed from the DNA by phenol/chloroform 

10 extraction. When exonuclease III is used as the 3' to 5' exonuclease, the protection of the 
non genetic bar code end from 3 ! to 5' exonuclease activity can also be achieved by 
generating a 3* overhang which is resistant to exonuclease III. This approach may be used 
as an alternative to the DNA-protein complex formation described above which sterically 
hinders (z.e., blocks or prevents) exonuclease progression along the DNA. A 3' overhang 

15 for this purpose can be obtained by installing a restriction enzyme site which produces such 
an overhang on the right side of site 1 (see FIG. 1). Suitable enzymes include Hae II, Kpn I, 
NsiI,PstI, Sac I, etc. 

If a single stranded genetic bar code region is generated in the manner described 
above,- all-four-nucleotides-may be incM 

2 0 codes (as opposed to using only three nucleotides as described herein). For a given 
nucleotide building block length (e.g. 3-mers to 10-mers or more), the number of unique 
sequences in any minimal mismatch set (MMS) for a given mismatch cutoff will be larger. 
In other words, it is possible to achieve a higher percentage of minimal pair-wise mismatch 
when using all four nucleotides. Accordingly, one benefit to using all four nucleotides is to 

2 5 achieve a reduction of potential cross hybridizations among similar but non-identical bar 
codes. Of course, this benefit must be weighed against the benefits of using bar codes 
consisting of three nucleotides in a given situation. 
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5.2.12. ALTERNATIVES FOR DESIGNING GENETIC BAR 
CODES NOT USING AN OLIGONUCLEOTIDE 
BUILDING BLOCK APPROACH 

Computer algorithms may be used to facilitate the design of unique genetic bar code 
5 sets having oligos of a given length, minimal cross hybridization, and the same or similar 
melting temperature (Tm) (see e.g. U.S. Patent Nos, 5,635,400 and 5,654,413 by Brenner). 

Non-natural nucleotides (i.e. modified nucleotides or nucleotide derivatives) may be 
used when generating a complementary genetic bar code array to enhance the binding 
affinity between genetic bar codes and their complements. 
1° Still further, when using the three nucleotide strategy to generate a protruding single 

stranded genetic bar code region from a double stranded vector, any DNA polymerase 
having 3 f to 5' exonuclease activity may be used. Such polymerases include Klenow, T7, 
Vent, Pfu, and T4 DNA polymerases. 



15 5.3. OUTPUT CONSIDERATIONS 

Methods are provided to systematically screen expressed genes of the human 
genome for specific functions using a large number of functional assays. Such technology 
provides a very rapid system for gene identification in which at least some functional 

iMomationmyte — 

2 0 examples of cell-based readout assays, changes in cellular morphology, immunostaining, or 
reporter gene expression can be detected within 1-2 days after library gene expression. In 
examples of biochemical-based readout assays, changes in ligand binding, growth factor 
activity, or enzymatic activity can be detected within 1-2 days after library gene in vitro 
transcription and translation. Using either approach, full functional screening of a library 
25 having a diversity of 10 6 can be completed within a few days. 

In this way, the methods of the invention provide the advantages of minimizing 
workload and reducing the occurrence of gene mutations which can arise in screening 
assays employing long-term culture. It is estimated that one of ordinary skill in the art can 
easily screen one library per week by the methods of the invention without using 
3 0 automation. Of course, if automation is used, multiple libraries per week may be screened. 

Cell based immunostaining assays have been extensively used under the single-gene 
approach for detection of gene expression, subcellular localization and biological functions. 
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Two examples of established mammalian cell-based immunostaining assays are as follows. 
First, cytochemical detection of intracellular LDL-derived cholesterol accumulation has 
been used to demonstrate that cholesterol accumulates in fibroblasts of individuals having 
Niemann-Pick CI disease (see Carstea et al., Science 277, 228-23 1). Second, cellular 
5 localization of heat shock transcription factor (HSF) has been used to demonstrate that HSF 
nuclear localization changes from a uniform distribution to a punctate distribution when 
staining for activated HSF after c-myb expression in 293T cells (see Kanei-Ishii et al., 

Science 277, 246-248). 

Similar to these two examples, functional assays of use together with the methods of 
10 the invention will measure changes (i.e. induction or reduction) of target gene expression, 
changes of cellular localization of a specific antigen, changes in cellular behaviors (e.g. 
growth factor secretion, apoptosis factor secretion, differentiation factor secretion), and 
changes in cellular morphology. 

There are many assays of gene function in existence, each having a particular 
15 readout. For example, induction or reduction of target gene mRNA or protein expression 
can be detected by means standard in the art, including nucleic acid hybridization and 
antibody detection of specific antigens. 

There are at least three categories of readout assays for use with the methods of the 

lh^tionT(arass^ 



niVt/ilLlAJll. \<± J — ^ — *■ 

20 associated with specific diseases; and (c) assays for genes associated with cellular 
physiological functions. Each of these categories is further described below. 

5.3.1. PATHWAYS 

The signals a cell receives, whether from outside or inside the cell, are generally 
2 5 transmitted through a cascade of molecular interactions, including protein-protein 

interactions. The overall process is generally termed signal transduction. The signaling 
pathways which may be assayed for identification of associated genes include, but are not 
limited to, a mitogenic signaling pathway, a STAT signaling pathway, an NFkB signaling 
pathway, a stress signaling pathway, an apoptosis signaling pathway, an NFAT signaling 
30 pathway, a Wnt signaling pathway, a CREB signaling pathway, an AP-1 signaling pathway, 
a proliferation signaling pathway and an anti-proliferation signaling pathway. 
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For proliferation signaling, BRDU incorporation or PCNA induction can be the 
cellular event detected. For stress signaling, P 53 induction, Jun induction, or nuclear HSF3 
aggregates can be the detectable cellular event. For apoptosis signaling, detection by 
ApoAlert™ staining (Gavrieli et al, 1992, J. Cell. Biol. 119, 493) or annexin staining can 
5 be the detectable cellular event. For anti-proliferation signaling, detection of P 21, P 27, p57, 
P 15, P 16, P 18, or pl9 induction can be the detectable cellular event. For Wnt signaling, 
detection' of P-catenin induction or P-catenin re-localization can be the detectable cellular 
event. For STAT signaling, detection of induction of areporter gene under the control of a 
STAT1, 2, 3, 4, 5, 6, or 7 promoter can be the detectable cellular event. For AP-1 signaling, 
10 detection of c-fos induction can be the detectable cellular event. For CREB signaling, 
CREB phosphorylation, or induction of a reporter gene under the control of the CREB 
promoter, can be the detectable cellular event. For NFkB signaling, NFkB re-localization, 
or induction of a reporter gene under NFkB promoter control, can be the detectable cellular 
event. For NFAT signaling, IL-2 mediated proliferation can be the detectable cellular 
15 event. Other signaling pathways can include Hedgehog signaling (detectable by GLI-1 , 
GLI-2, and GLI-3 induction); nuclear receptor signaling (detectable by induction or 
reduction of a reporter gene under estrogen, retinoic acid, vitamin D3 or thyroid hormone 
responsive promoters); antiviral signaling (detectable by induction of interferon alpha or 

Tela)^-rr^xlr^^^ 
20 responsive promoter); BMP signaling (detectable by nuclear translocation of Smad); and 
insulin signaling (detectable by Glutl or Glut4 re-localization). 



5.3.2. DISEASES 

Specific diseases of interest include, but are not limited to, cancer, inflammation, 
25 atherosclerosis, autoimmune diseases, diabetes, infection, diseases of metabolism {e.g. 
obesity), and neurodegenerative diseases {e.g. Alzheimer's disease and Parkinson's disease). 
Readout assays involving detection of changes {i.e. increases or decreases) in the levels of 
the following targets may identify genes associated with the indicated specific disease. 
Assays that may detect genes involved in cancer include assays for detection of: 
3 0 HLA for immune surveillance; OSM for anti-cancer growth; GADD45 and GADD153 for 
tumor suppression; nm23 for anti-metastasis; vEGFA, vEGFB, vEGFC, PIGF, and FGF2 
for angiogenesis; MDR for drug resistance; CASP100 for apoptosis; and PDGFA, PDGFB, 
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FGFl 3 4, 5, 6, 7, 8, and 9, IGF 1, IGF 1 1, cyclin A, Bl, C, Dl, D2, D3, E, F, Gl, and H, 
c-myc and c-Jun for growth. Assays that may detect genes involved in inflammation 
include detection of Cox2, IMP. IL-6, TNFa, IL-13, E-selectin, VCAMI, ICAM 1 and 2, 
NFkB, c-Rel, RelB, IkBcc, IkB0, and Bcl3. 
5 Changes in the level of expression of the following targets may be assayed 

immunologically in response to expression of a heterologous cDNA in order to detect genes 
which may be involved in a given disease. For example, the potential targets that can be 
used for detecting genes involved in atherosclerosis include Egr-I. The potential targets that 
can be used for detecting genes involved in autoimmunity include Fas and Fas hgand. The 
10 potential targets that can be used for detecting genes involved in diabetes include msuhn. 
The potential targets that can be used for detecting genes involved in infection include 
chemokines (MTP-la, MIP-ip, MIP-2, RANTES, MCP-1, MCP-2, GROa, GRO P , GRO Y , 
ENA-78, 1309, and IP10) and various cytokines (e.g. IL-2, IL-13, GM-CSF, G-CSF, and 
M-CSF). The potential targets that can be used for detecting genes involved in obesity , 
15 ^de leptin and the leptin receptor. The potential targets that can be used for detecting 
genes involved in Alzheimer's disease include Tau, CRF, CRF receptor, CRF-BP, 
Urocorun, and neuronal growth factors (,g. BDNF, NT3, NT4, NT5, CNTF, and GDNF). 
The potential targets that can be used for detecting genes involved in Parkinson's disease 
^lud^tHT^d^synuclein: 



20 

5.3.3. FUNCTIONS 

Where identification of genes associated with various physiological functions is 
desired, an assay may be employed which can detect changes in such functions as cell 
growth apoptosis, senescence, differentiation, adhesion, binding of a cell to a specific 
25 molecule, binding of a cell to another cell, cellular organization, organogenesis, intracellular 
transport, transport facilitation, protein synthesis, transcription, energy conversion, 
metabolism, myogenesis, neurogenesis, or hemopoiesis. Examples of such cellular 
physiological functions and assays for detecting changes in them include, but are not 
limited to: cholesterol transport, detectable by detecting intracellular cholesterol 
3 0 accumulation; myogenesis, detectable by detecting induction of MyoD or MEF-2; 
neurogenesis, detectable by detecting induction of neuro D; and vasodilation and 
neurotransmission, detectable by induction of inducible nitric oxide synthase (iNOS). 
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A number of specific exemplary assays which may be used to identify genes in 
conjunction with the methods of the invention are set forth in detail below. 

5.4. CELLULAR READOUT ASSAYS 
5 5.4.1. PROLIFERATION PATHWAY 

Bromodeoxyuridine (BRDU) incorporation may be used as an assay to identify 
genes involved in proliferation. The BRDU assay identifies a cell population undergoing 
DNA synthesis by incorporation of BRDU into newly-synthesized DNA. Newly- 
synthesized DNA may then be detected using an anti-BRDU antibody (see Hoshino et al, 
10 1986 Int. J. Cancer 38, 369; Campana et al, 1988, J. Immunol. Meth. 107, 79). 

A proliferating cell nuclear antigen (PCNA) assay may also be used to identify 
genes involved in cell proliferation. PCNA (a.ta. cyclin or the polymerase d associated 
protein) is a 36 kilodalton protein whose expression is elevated in proliferatmg cells. 
PCNA is synthesized in early Gl and S phases of the cell cycle and therefore serves as an 
15 excellentmarkerforproliferatingcells. Positive cells are identified by immunostammg 
using an anti-PCNA antibody (see Li et al, 1996, Current Biology 6, 189; Vassilev et al, 
1995, J. Cell Sci. 108, 1205). 



5.4.2. STRESS SIGNALING PATHWAY 

2 0 p53 is an important modulator of the stress response. P 53-dependent transcriptional 

activation may therefore be used to identify genes involved in a stress signaling pathway. A 
readout cell population containing a reporter gene under the control of a pSS-mduable 
promotermaybeusedfortheassay. Suitable reporter genes include, but are not limited to, 
P-galactosidase ( P -gal), chloramphenicol acetyltransferase (CAT), and luciferase. Positive 
25 cells may be identified by blue color in a P-gal reporter gene assay (see e.g. Komarova et 
al 1997 EMBO J. 16, 1391-1400) or by immunostaining for the reporter gene product. A 
P 53 induction assay may also be used to identify genes involved in a stress signaling 
pathway. P 53 induction (i.e. increases in cellular P 53 protein expression) may be identified 
by immunostaining using a specific anti- P 53 antibody (Anker et al, 1993, Int. J. Cancer 55, 
30 982- Weiss et al, 1993, Int. J. Cancer 54, 693). 

A heat shock transcription factor 3 (HSF3) aggregation assay may also be used to 
identify genes in a stress signaling pathway. The HSF3 aggregation assay measures HSF3 
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aggregation in the nucleus induced by cellular stress signals through immunostaining using 
a specific anti-HSF3 antibody (Kanei-Ishii et aL, 1997, Science 277, 246). 

An activated c-Jun kinase assay may be used to identify genes in a stress signaling 
pathway. c-Jun kinase (JNK) is a transcription factor which is activated by phosphorylation 
5 (p-JNK). Many stress signals result in activation of c-Jun kinase by phosphorylation 
(Derijard et aL, 1994, Cell 76, 1025). The availability of p-JNK specific antibodies (Santa 
Cruz) allows in situ detection of cells in which JNK is activated by heterologous library 
genes. 

5.4.3. LOSS OF INVASIVENESS 

Invasion inhibition assays may be used to identify genes involved in cancer. One 
such assay measures induction of E-cadherin-mediated cell-cell adhesion. The induction of 
E-cadherin-mediated adhesion can result in phenotypic reversion and loss of invasiveness of 
epithelial cells. This assay measures increased expression of E-cadherin at the cell junction 
through immunostaining using a specific anti-E-cadherin antibody (Hordijk et aL, 1 997, 
Science 278, 1464). Another such assay measures loss of hepatocyte growth factor (HGF)- 
induced cell scattering. Loss of HGF-induced cell scattering is correlated with loss of 
invasiveness of epithelial cells such as Madin-Darby canine kidney (MDCK) cells. This 
assay-identifies-a-eell-population^ 

and therefore forms compact colonies (Hordijk et aL, 1997, Science 278, 1464). 

5.4.4. APOPTOSIS SIGNALING PATHWAY 

One assay for apoptosis is the terminal deoxynucleotidyl transferase-mediated dUTP 
nick-end-labeling (TUNEL) assay. The TUNEL assay is used to measure nuclear DNA 
fragmentation, the hallmark of apoptosis in many cell types (see e.g. Lazebnik et aL, 1994, 
Nature 371, 346), by following the incorporation of fluorescein-dUTP (Yonehara et aL, 
1989, J. Exp. Med. 169, 1747). These assay kits are commercially available through 
suppliers such as Clontech and Boehringer Mannheim. 

5.4.5. ANTI-PROLIFERATION PATHWAY 

One assay useful for gene identification in an anti-proliferation signaling pathway is 
the pi 5 induction assay, pi 5 is a member of a family of specific inhibitors of Cdk4 and 
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Cdk6. The latter are essential for Gl progression into S phase of the cell cycle (Sherr et al, 
1995, Genes & Dev. 9, 1 149). The expression of pl5 is positively regulated by 
transforming growth factor-p (Reynisdottir et al, 1997, Genes & Dev. 11, 492). pl5 
induction may be identified by immunostaining using a specific anti-pl5 antibody available 

5 commercially (e.g. Santa Cruz). 

Another assay useful for gene identification in an anti-proliferation signaling 
pathway is the p21 induction assay. Increased levels of p21 expression in cells results in 
Cdk inhibition, thus resulting in delayed entry into Gl of the cell cycle (Harper et al, 1993, 
Cell 75, 805; Li et al, 1996, Current Biology 6, 189). For example, p21 expression can be 

10 elevated by p53 and transforming growth factor-p activities. p21 induction may be 
identified by immunostaining using a specific anti-p21 antibody available commercially 
(e.g. Santa Cruz). 

Yet another assay useful for gene identification in an anti-proliferation signaling 
pathway is the p27 induction assay. As for the assays above, p27 is also a member of the 
15 Cdk inhibitor family of proteins. The expression of p27 is increased upon mitogen 

withdrawal or contact inhibition (Polyak et al, 1994, Cell 78, 59). p27 induction may be 
identified by immunostaining using a specific anti- P 27 antibody available commercially 
(e.g. Santa Cruz). 



20 5.4 <6 . WNT SIGNALING PATHWAY 

One assay for detection of genes which modulate the Wnt signaling pathway is a 
P-catenin induction and/or translocation assay. The activation of the Wnt signaling 
pathway results in an increased expression of P-catenin and the translocation of p-catenin 
from the cytoplasmic compartment to the nucleus (Kuhl et al, 1997, BioEssays 19, 101). 

25 This assay is used to identify cells and/or cell populations in which the expression of 
P-catenin is increased compared to background levels, and/or in which a change of 
P-catenin localization occurs, in response to expression of a heterologous gene. Changes in 
p-catenin expression or localization are detected using a specific anti-P-catenin antibody 
(e.g. Tao et al, 1996, J. Cell Biol. 134, 1271). 

3 0 Another assay for detection of genes which modulate the Wnt signaling pathway is a 

LEF-1 inducible promoter induction assay. P-catenin activates downstream targets in the 
Wnt signaling pathway by binding to a transcription factor known as LEF-1, thus resulting 
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in activation of a LEF-1 inducible promoter (Korinek et al., 1997, Science 275, 1785). A 
readout cell line containing a reporter gene, such as p-gal, under a LEF-1 inducible 
promoter is used for the assay. When P-gal is used as a reporter gene, positive cells are the 
darker blue cells. 

5 

5.4.7. STAT SIGNALING PATHWAY 

The STAT (signal transducers and activators of transcription) signaling pathway is 
activated by many growth factors and cytokines and plays essential roles in cell 
differentiation, cell cycle control, and development. There are six known members of the 

1° STAT transcription factor family. Each STAT family member (except STAT2) is known to 
recognize a specific DNA binding sequence (Me, 1996, Cell 84, 331). The assay employs a 
readout cell line containing a reporter gene, such as p-gal, under the control of any of these 
known STAT-inducible promoters (White et al, 1996, Cytokine Growth Factor Rev. 7, 
303). Positive cells stain dark blue when p-gal is used as the reporter gene. This assay may 

15 be used to identify genes in a STAT1 signaling pathway, a STAT3 signaling pathway, a 
STAT4 signaling pathway, and/or a STAT5/STAT6 signaling pathway. Since STAT5 and 
STAT6 share the same DNA recognition site, the assay does not distinguish between these 
two STAT pathways. Readout cells expressing a gene which activates a particular STAT 
trwcription-factorwill-produce-a-posit^ 

20 just upstream in the particular STAT pathway assayed. 

5.4.8. MAP KINASE SIGNALING PATHWAY 

MAP kinase signaling pathway genes may be identified using a p-ERK assay. The 
activation of this signal transduction pathway by certain growth factors, hormones and 

25 neurotransmitters is mediated through two closely-related MAP kinases, p44 and p42, also 
known as ERK1 and ERK2. ERK proteins are activated by dual phosphorylation at specific 
tyrosine and threonine sites. The p-ERK assay is used to identify genes by immunostaining 
readout cells with an antibody which specifically detects the presence of phosphorylated 
ERK (p-ERK). Such p-ERK antibodies, which only recognize phosphorylated ERK1 and 

30 ERK2, may be obtained commercially {e.g. Santa Cruz). See Boulton et a/., 1991, Cell 65, 
663. 
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5.4.9. AP-1 SIGNALING PATHWAY 

Genes in an AP-1 signaling pathway may be identified using a c-fos induction 
readout assay. The AP-1 signaling pathway is involved in cell proliferation, cell survival 
and cell stress. Activation of the AP-1 signaling pathway results in an increased expression 
5 of genes under the control of an AP-1 promoter sequence such as the c-fos gene (see e.g. 
Karin et al, 1997, Curr. Opin. Cell Biol. 9, 240). The c-fos induction assay identifies genes 
expressed in cell populations in which the level of endogenous c-fos protein is increased by 
immunostaining c-fos using a specific anti-c-fos antibody (Telford et al, 1996, J. Comp. 
Neurol. 375, 601). 

10 

5.4.10. CREB SIGNALING PATHWAY 

In one embodiment, genes in a cyclic adenosine monophosphate response element 
binding protein (CREB) signaling pathway may be identified using a phosphorylated CREB 
(p-CREB) readout assay. CREB is activated by phosphorylation following an increase in 
15 the intracellular concentration of cAMP or Ca 2+ . An antibody which specifically recognizes 
phosphorylated CREB allows detection of an activated CREB pathway in readout cells 
(Ginty et al, 1994, Cell 77, 713). 

In another embodiment, genes in a CREB signaling pathway may be identified using 

a^clic^deWsine^ 

20 assay, a readout cell containing a reporter gene (e.g. p-gal, CAT or luciferase) under the 
control of the CRE is used for the assay. Positive cells may be identified by, e.g., blue 
staining in a p-gal assay (Himmler et al, 1993, J. Recept. Res. 13, 79; Kruger et al, 1997, 
Naunyn Schmiedebergs Arch. Pharmacol. 356, 433) or by immunostaining for the reporter 
gene product. 

25 

5.4.11. NFkB SIGNALING PATHWAY 

In one embodiment, an NFkB translocation assay may be used to identify genes in 
an NFkB signaling pathway. Activation of the NFkB signaling pathway results in 
translocation of NFkB from the cytoplasm to the nucleus. The NFkB translocation assay 
3 0 identifies cells with NFkB translocated to the nucleus by immunostaining for NFkB using a 
specific anti-NFKB antibody (Han et al, 1997, J. Biol. Chem. 272, 9825; Janssen et al, 
1995, Adv. Cancer Res. 151, 389). 
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5 



In another embodiment, an NFkB reporter gene assay may be used to identity genes 
in an NFkB signaling pathway. In this assay, a readout cell containing a reporter gene (e.g. 
p-gal, CAT or luciferase) under the control of an NFkB response element is used for the 
assay. Positive cells may be identified, e.g., by blue staining in a P-gal assay or by 
immunostaining for the reporter gene product (Rothe et al, 1995, Science 269, 1424). 



5.4.12. NFAT SIGNALING PATHWAY 

Genes in an NFAT signaling pathway may be identified using a NFAT reporter gene 
assay. In this assay, a readout cell expressing a reporter gene (e.g. p-gal, CAT or luciferase) 
10 U nder the control of an NFAT response element is used. Positive clones may be identified 
by blue staining in a P-gal assay (see e.g. Burres et al, 1995, J. Antibiot. 48, 380) or by 
immunostaining for the reporter gene product. 

5.4.13. INSULIN SIGNALING PATHWAY 

1 5 Genes in the insulin signaling pathway may be identified using a GLU4 

translocation assay. Insulin stimulation of adipocytes results in translocation of the GLU4 
glucose transporter to the plasma membrane. This assay identifies cells in which the insulin 
signaling pathway is activated by immunostaining GLU4 protein localized at the plasma 

membrane-(Martin-^a/Trl-996 r J^Biol.-Chem. 27-1,-11605.) 

20 

5.4.14. MDR SIGNALING PATHWAY 

Genes in the multiple drug resistance (MDR) gene regulation pathway may be 
identified using an MDR reporter gene assay. MDR gene expression is often greatly 
increased in cancer cells resistant to chemotherapy. In this assay, a readout cell containing a 

2 5 reporter gene (e.g. P-gal, CAT or luciferase) under the control of an MDR gene promoter 

may be used for the assay. Positive cells may be identified by blue staining in a P-gal assay 
(Walther et al, 1997, Gene Ther. 4, 544) or by immunostaining using an antibody specific 
for the reporter gene product. 

3 0 5.4. 15 . CHOLESTEROL TRANSPORT PATHWAY 

Genes important in a cholesterol transport pathway may be identified using an 
intracellular cholesterol accumulation assay. For example, mutations of the Niemaim-Pick 
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type C (NP-C) gene result in lysosomal accumulation of low density lipoprotein 
(LDL)-derived cholesterol. The accumulated cholesterol in the cytoplasm is detected by 
staining with filipin, a specific cytochemical marker of unverified cholesterol. The filipin 
staining assay may be used to identify cells with cholesterol accumulation due to the 
5 expression an exogenous sense or anti-sense cDNA (see Eugene et ai, 1997, Science 277, 
228). 

5.5. BIOCHEMICAL READOUT ASSAYS 
In the practice of this invention, biochemical readout assays may be used to identify 

10 genes modifying specific activities following in vitro transcription and translation. Such 
biochemical readout assays include, but are not limited to, enzymatic and receptor-based 
assays. There are a wide variety of assays for enzymatic activities and receptor-binding 
activities which may be adapted for use in identification of new genes upon screening a 
library of interest, as further exemplified in this Section below. 

15 There are many resources available describing such enzymatic and receptor-based 

assays suitable for use with the methods of the invention. For example, Methods in 
Enzymology is a multi-volume reference published by Academic Press which describes 
biological methods, including enzymatic and receptor-based assays, in detail. Further, 

"F^dez-BotranandVetri^ 

2 0 describes assays for immune cell activation, including cytokine receptor assays. 

Biochemical readout assays may include, e.g., detection of: GABA receptor activity, 
glutamate receptor activity, monoamine oxidase activity, nitric oxide synthetase activity, 
opiate receptor activity, serotonin receptor activity, adenosine A, agonist and antagonist 
activity, adrenergic o„ *» P, agonist and antagonist activity, calcium channel blocker 
25 activity, inflammatory mediator activity, such as the interleukins (e.g. IL-1, IL-6), tumor 
necrosis factor activity, arachidonic acid activity and phosphatase activity (e.g. tyrosine 
phosphatase). Further, biochemical readout assays may include, for example, binding to 
protein domain or subdomain, for example, a PDZ domain, a PH domain, an SH2 domain, 
and an SH3 domain. Still further, biochemical readout assays may include binding to a 
3 0 molecule, for example, phosphotyrosine and phosphorylated inositol. A functional 
assignment given to a particular gene may be derived from results obtained, in more than 
one assay. Indeed, it is preferred that a functional assignment be derived from results 
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obtained in apanel of two o, more assays. GeneraUy, one skilled in the art would know 
which assays are appropriate to employ to best identify genes having, or mcAfymg, a 
particular function-of-interest. 

Further specific examples of assays based on enzymes or receptors include the 
5 following: acetylcholinesterase; aldol-reductase; angiotensin converting enzyme (ACE); 
cyclooxygenases; DNA repair; ^-glucuronidase; lipoxygenases; monoamtne oxtdases; 
phosphohpase A, platelet activating factor (PAF); potassium channel assays; prostaglandm 
synthetase; serotonin re-uptake activity; and steroid receptors. Additional assays may 
i„c,ude ATPase inhibition, benzopyrene hydroxylase inhibition, HMG-CoA reductase 
10 inhibition, phosphodiesterase inhibition, protease inhibition, and tyrosine kinase mtabmon. 

5.6. USER-DEFINED ASSAYS 
The methods of the invention are not limited to use with the readout assays 
described herein. Such readout assays merely serve to exemplify a few of the mynad 
IB possibilities suitable for use with the invention. When the readout assay is a cellular 
readout assay, virtually any ce.l line identified as suitable by one skilled in the art may be 
used Further, virtually any reporter gene, or endogenous gene functioning as a reporter 
gene identified as suitable by one skilled in the art may be used. 1, will be well noted by 
— ISSStai « F5iIT nWd-s U mme^or,^m,^^-^- 
2° readout assay, whether the assay be cellular or biochemical. 

The skilled practitioner will recognize that it is the particular readout assay, whether 
chosen from the literature or designed by the user, which determines the We (i.e. function) 
of genes identified. For example, if one wishes to identify genes associated wth cancer 
onemaychoosetoscreenmelibraryofinterestusingthepS^andorMDRassaysdesertbed 

« above. Often, the user will provide the most appropriate readout assay to be employed for 
identification of particular genes-of-interest. 
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5.7. AUTOMATION 

It is preferred that automation technology be applied throughout the entire functional 
gene identification process. Many steps in the overall process are amenable to such ^ 
automation. For example, robotic colony picking may be used for building a library of 10 
5 clones from plates containing well-isolated colonies. Robots suitable for this purpose are 
available commercially from, e.g., Qiagen, Gentix, etc. Similarly, transfection of retroviral 
vectors into producer cells, and in situ transfection of bar-coded, sorted libraries mto 
readout cells, are repetitive operations suitable for robotic automation. Further, the system 
is suitable for automated immunostaining of the co-culture, and to automated microscope 
10 viewing of the immunostained result. Only one population of bar codes is needed for all 
screenings and the same nucleic acid array can be used repeatedly. Automation can be 
applied to hybridization to an array such that the same hybridization conditions are used for 
various libraries. Automation can also be applied to in situ transfections and in situ 
bioassays. 

15 

6. EXAMPLES 

The following examples are provided merely as illustrative of several embodiments 
and should not be construed to limit in any way the invention. 



20 6 j. ASSAYS FOR CELLULAR PROLIFERATION 

In the proliferation/anti-prohferation assays described in this example, the function 
of genes identified will depend on the type of library screened. For example, if a sense 
cDNA library is screened, genes associated with proliferation will be identified. By 
contrast, if an antisense cDNA library is screened, genes associated with anti-proliferation 

25 will be identified. 

In one assay, PCNA immunostaining is performed using readout cells that are 
starved for at least 24 hours in low serum {e.g. 0.5%) medium prior to tetracycline 
induction. After 12-24 hours of tetracycline treatment, the cells are fixed in cold methanol 
(e g 5 minutes at -20°C) and air dried. Cells are blocked with 10% normal goat serum m 
30 phosphate buffered saline (PBS) for 30 minutes at 37°C. This is followed by incubation 
with a 1:10 dilution of anti-PCNA antibody (e.g. PC10 antibody, Dakkopat) for 2 hours at 
37'C. Cells are rinsed with PBS (e.g. three times for 5 minutes each) and incubated w,th 
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, r „„,ihodv conjugated with fluorescein isothiocyanate (FITC; NBL, 
^anti-mouseigG-ody-.ug momtciinmm6a% ^ 

Nagoya) for 45 minutes at 37 C. After w 8, , s 
u g 3% bydroquinone, 50% glycenn, pH 8-9). Observat t 

performed using a fluorescence microscope ^ similarly 

ttttnu incorporation is performed usmg ica 

starved as above. 10 uM 5-bromo - yu nduaion , ^ re ado„t 

Muction, orafew hours .».er(,. S .2-12hours) After 24 
cells are f,cd absolute methanol for 10 minu * , 4 C> and ,r ^ 
re nydra«ed (, g . PBS for 3 minutes), DNA ,s denatured (** M «C 

u a i„ c pu<5 3X over 10 minutes), and counierswmc 
^ (i. e. visualization of readout cells staining positive with anti-BRDU antibody) 
6.2. ASSAYS FOR P 53 REGULATION 

" screen.. 

antisense library ,s used. The p53 assays n> y „ „ 

levels « by measuring leveU of a reporter gene operably hnked to a p53 p 

describedbdow. e „ dogenous p53 expression is activated, 

ForanassayforpSSmductron gene ^ion, 

« readout cells are treated with, et^cycnne for are fixed with 1:1 

prior to an„.p53 antibody Saining for »^7^ followe4byblotltin gwid, 
a—ethano,a,-20Xfor>0n*u,es^^^^ 

30 a 1 hour incubation. After washing with PBS three. tim 

anti-mouse IgG antibody (Cappel) may be used for detection. 
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FOT an assay of p» induction ustng a reporter gene, a P 53 promoter operably Meed 
m av be used Readout cells containing the Hal gene under the 
toaP-galreportergenemayb ««*• £ „ , fam ttmsg emc mice (« Komarova « 

1997, EMBO J. 16, 1391-1400) by tetracy cline, 
g eue under control of the p53 promoter. Such readout ceils are tnd . ^ 

fixed wift 1% glutaraldehyde in PBS, washed three trntes wtth PBS, and sta. 



staining 

10 



63 HSF INTRACELLULAR TRANSLOCATION ASSAY 

• c . m ?n intracellular translocation assay may be 
A heat shock transcript*, factor (HSF) ntracellul ^ ^ 

us ec,o identify genes whichare associated ^^T^JL^i. 
nU c,eus. One or ntore HSF transport inducer genes may * '^'^J^ 

» scree ned. A^-«-~7££^ZI<L. 

be identified if an antisense library is screeneu. 

goat serum m 

20 The ^epaxadonmay be washed and mounted as described above. 

64 CHOLESTEROL TRANSPORT ASSAY 
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^ • Thpfilinin staining solution is prepared by 

dissolving 2.5 mg of filipin m 1 ml ^^^^ ' ^ ^ Dulbecco's PBS and mounted with 
PBS. The stained cells are washed three times win » 
glycerol/gelatin containing 1% phenol. 

65 LIBRARY MUTATIONAL ANALYSIS OF A SINGLE GENE 
^ — ptovidesa^f - — 

screening methods described herein, in tn 

ftatn.ng.heDN g proc Natl . Acad . Sci. U.S.A. 89:5467; Leung « al., 

fidelity (» e. g . Pace « a!., 1992, Proc. ^ ^ 

,9 8 9,Tecnni,uel:ll>. The mutagemzed hbrary may » 

o ppt? mine 5'-truncated pnmers (Pues et at., \ w 
be obtained through inverse PCR using 3 mm 

I 5 Res. 25:1303). 

6o CONSTRUCTION OF A BIOLOGICAL ARRAY VECTOR 

"suitable vectors for construction ofb.ologica, arrays mcludeplasmtdMH orp 

microbial used to construct a biological array is yeast, yeas 
etc. (available from New England BioLabs) may be used. 
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« 7 MANUALLY-SORTED cDNAs AND OUGONUCLEOTIDES 
TO R USE IN IN S/n/TRANSFECTION AND CELLULAR 
READOUT ASSAYS 
Described hereinabove is .he use of nucleic actd array, for sorting bar-coded cDNA 

, bribed herein Snch manual sorting has the advantage of not 

10 "rted cDNA population can be considered to be another form of a nucletc ac,d array^Such 
herein so long as the cDNA is cloned into an expression vector which is capable of 
roll.inthisway.amannahy-sortednucleicac.darraycanbensedtoana^a 

!5 c0 , lectio „ „, full-length genes-of-interes. from any given source. 
AmanuaUy-sortedsingie-s^^ 
US ed forthe trar.fec.ion procedures and ce M ar readout assays descnb* M« .« 



* sortedoligon^deartaym^ 



25 



sortedoligonucleotidea.rayrnaybeobta.neatnrou, ' 
oligonucleotides onto a solid support (e.g. nitrocellulose or nylon). Such an app 

populationofantisenseol^^^^ 
particular target gene, such as the ras oncogene. 
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TheinventadescriWandctoedhereinisnot.ob.Hm^mscopebythe 
specific events herein disciosed since these embodiments are intended as ~ 

te scope of this invention, indeed, various modifoations of the invents m addtnon , 
5 loses ^^ownanddescHbedhereinwiUbccomeapparenttoUtosesidncdrnt eartfrom* 

«r contents arehereby incorporated by reference into u>e present appbeauontnthetr 

entireties. 

10 



15 



20 



25 



30 
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We claim: 

, . A method for conducting • «*•>- ™ - 4 10 SCrK " 2 

coded cDNA library comprising a nucle , c acid array, 

(b) transfccting the library sorted m step (a) mto 

(c) conducting Fhe biological readout assay. 

, ^ntetbodofC^Lwhereintbenncleicactdarrayisabtologicalarray 

1° or a gene chip. 

„ w Claim' wherein the biological array comprises a population 
3 The method of Claim -,wuc 

• • different bar code complementary to a bar coaeo 
ofveotors.eachvectorcontammgad^ren wherem the population of 

cDNA library to form a population of complementary bar codes, 
15 vec tors is immobilized on a support. 

4 Th6I ne,bodo f Claim3,*retn m epopul,ionorcomp, m en.ary b arcode S 

CMtw ,. t i iM » i o ' « . i» r H «««iggfc 



20 5 . ThemethodofClaimS.whereinthesupportts^edofnt— or 

nylon. 

, ^memodofClaimLwheretn — is carried ou, u,ng a 
chemical transfectant or electroporation. 
" 7 W eme,odo f «ml,wherein ft ereadou,ce,,,ineis W H3T 3 ee 1 ,s 

^ingareportergeneunderthecon.olorarespo.eelementorpromoter. 

30 ,JU*— ..u^eandchloramphenicolace,,— e. 
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9 The method of Claim 7, wherein the response element or promoter is 
selected from the group consisting of an NFkB response element, an NFAT response 

a LEF-l-inducible promoter and ap53-inducible promoter. 

,0. Theme.hodofClaiml.whereinexpressionoftheba.-codedcDNAlibrary 
is tetracycline inducible or estrogen inducible. 

I , The method of Claim 1, wherein the biological readout assay is capable of 

10 detecting genes in a pathway seiected from the group consisting of a mitogenic signahng 
paurway.aSTATsignalingpathway.anNFKBsignalingpamway.as^sstgnaimg 

pathway, an apoptosis signaling pathway, an NFAT signaling pathway, a Wn, stgnalmg 
pathway, a CREB ^pathway, an AP-1 siting pathway, a proliferate stgnalmg 
pathway and an anti-proliferation signaling pathway. 



15 



12. A method for conducting a biological readout assay used to screen a bar- 

coded cDNA library comprising: 

(a) sorting the bar-coded cDNA library using a nucleic acid array having a 



20 



(b) 



(c) 



25 



30 



plurality of concave loci; 

expressing the bar-coded cDNA library sorted in step (a) using in vitro 
transcription and translation to produce a population of protems; and 
screening the population of proteins produced in step (b) for an actwUy-of- 
interest, 

so as to conduct the biological readout assay. 

1 3 The method of Claim 12, wherein the activity-of-interest screened in step (c) is 
selected from the group consistmg of a receptor-bindmg activity, a 1^^ 

and a growth factor activity. 

14 ThememodofClaim 12, wherein screening is earned ou, by immobilizing the 
population of proteins on a solid support and conducting a binding assay wtth the 
immobilized population of proteins. 
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15 . The method of Cm .4, wherein me solid -W* * fa « °< " 

or nylon. 

,6 The method of Claim 1 2,»her=ms C ,eemngiscamedo U tbyplacmg«he 
» population of protein, in contact with readout cells and conducting a hiologica, ac.,v„y 

assay. 

, 7 . A method for identifying one or more genes-of-interest in a pre-sorKd cDN A 
assay, 

to identify one or more genes-of-interest. 

18. Then^crfC^ 

bar-coded cDNA library hybridized to a nucleic acid array. 

r9:~T^n^f^ 

20 transfectantsorelectroporation. 

20 The method of Cairn 17, wherein thebiological readout assay identifies one 

signaling pathway and an anti-proliferation signaling pathway. 

21. A method of expression cloning one or more g enes-of-inter=st in a cDNA 

30 library comprising: 

(a) sorting the cDNA library; 

(b) transfecting the sorted library into a readout cell line; and 

-53- 



PCT/US99/08823 



WO 99/55886 



(6) identifying a positive signal 6» the W*-d library in a 

biological readout assay, 

using a nucleic acid array. 

23 . Th^tfC-m*.^-^^^^ 

out using chemical transfeotants or electroporation. 

10 24 . T.eme.hodofCiaimiUwhere.nmeposi.ivcsignalisiden.ifedby 

immunocytochemistry. 

siting pathway and an anti-proliferation signaling pathway. 
20 2 , AmeU.odofsorttngacDNA.ibraryforuseinanexp.essionConingassay 

( a, Ooningapop.iationofcBNAinsetUin.oapopu.ationofbat-coded 

W ^Uyhyexposingomy.hebarcodetegioninsingie.uanded 

form; and . , 

(e) hybridangthepopulationofbar-codedvectorstoannCacacd 

array to sort the cDNA library. 

group consisting of a gene chip and a biological array. 
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28 Themethod of Cain, 26, wherein preparing. be popuiation of bar-coded 
vector, for hybrid,za„on to a DN A array by exposing on,y the bar code region ■„ stng^ 

0rte ^ digesttng -he portion with a restriction endonuclease to Hnearize 

the population; 

binding a DNA-binding protein to at least two sites on the 
population; and 

digesting the population bound in step (b) to expose the angle- 
stranded bar code region. 



(a) 
(b) 
(c) 



15 



AP1.SP1 andp53. 

30. Ue method of Claim 28, wherein me reaction e„do„uc,ease is seiected 
from the group consisting of Notl, Sftl and EcoRI. 



^l^d^almTs^^ 
2 0 is carried out using an enzyme selected from the group consisting of exonuciease III, T 

DNA polymerase. 

32 . A^thodforconductingabiologicalreadoutassayusedtoscreenaba, 

coded cDNA library comprising: 

(a) sorting the bar-coded cDNA library using a nuclei acrd array having 

plurality of concave loci; - 

solution which facilitates release of the bar-coded cDNA library from the 
nucleic acid array to carry out in situ transfection; and 
(c) conducting the biological readout assay. 



25 



(b) 



30 
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33 . Th en,e t ho d ofClaun32,whe re i„.he m c,eica cM a.ayi S abio,o g ica,a n .y 

or a gene chip. 

34 The method of Claim 33, wherein the biological array comprises a 

taini™ a different bar code complementary to a bar 

population of vectors is immobilized on a support 

35 Th emetho<iofClaim34,wherein.hepopu.a«io„ofcomple m en^bar 
10 codes constats of from 10" to 10' complementary bar codes. 

36 . The method of Cairn 34, wherein the support is formed of nitroceHulose or 

nylon. 

« 37. ThemeftodofClaim32,whereintra„sfec, i n g i n « ( »iscarriedou.usin 6 a 

chemical transfectant or electroporation. 

38 Themeu,odofaaim32,wherein,hereadoutc=.llineisNlH3T3c=lls 

""^r^e^d^e^ 

20 , 

39 The method of Claim 38, wherein the reporter gene is selected from the 

^pconsistinsofM**^^ 

40 The method of Claim 38, wherein the response element or promoter is 

• • ( M \rpvR resoonse element, an NF AT response 
« selec.edfrommegroupcons,s,mgofanNFKBrespo„ iWepromoW 

element, a cyclic adenosine monophosphate response element, a STAT 
aLEF-l-inducible promoter and a p53-inducible promoter. 

4 , ThemethodofaaimJ^heremcpressionofftebar^edcDNAlibr^ 
30 is tetracycline inducible or estrogen inducible. 
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4, The method of Claim 32, wherein .he biological readout assay is capable of 
detectives in apathway se.ected from the group ccnsistingof a mttogenic signing 
pa,hwa y ,aSTATsignahngpa«hway,anNFKBsigna.mgpa m way,as tt esss,gnaU„g 

pathway and an anti-proliferation signaling pathway. 
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